A Spatially Aware Machine Learning Method for Locating Electric Vehicle Charging Stations

Huang, Yanyan; Ren, Hangyi; Jia, Xudong; Yu, Xianyu; Xie, Dong; Zou, You; Chen, Daoyuan; Yang, Yi

doi:10.3390/wevj16080445

Open AccessArticle

A Spatially Aware Machine Learning Method for Locating Electric Vehicle Charging Stations

by

Yanyan Huang

¹,

Hangyi Ren

^1,*,

Xudong Jia

^2,*

,

Xianyu Yu

¹,

Dong Xie

³,

You Zou

¹

,

Daoyuan Chen

¹ and

Yi Yang

¹

School of Civil Engineering, Architecture and Environment, Hubei University of Technology, Wuhan 430068, China

²

School of Engineering, Science, Technology, Central Connecticut State University, New Britain, CT 06050, USA

³

School of Urban Design, Wuhan University, Wuhan 430072, China

^*

Authors to whom correspondence should be addressed.

World Electr. Veh. J. 2025, 16(8), 445; https://doi.org/10.3390/wevj16080445

Submission received: 24 June 2025 / Revised: 31 July 2025 / Accepted: 4 August 2025 / Published: 6 August 2025

(This article belongs to the Special Issue Fast-Charging Station for Electric Vehicles: Challenges and Issues)

Download

Browse Figures

Versions Notes

Abstract

The rapid adoption of electric vehicles (EVs) has driven a strong need for optimizing locations of electric vehicle charging stations (EVCSs). Previous methods for locating EVCSs rely on statistical and optimization models, but these methods have limitations in capturing complex nonlinear relationships and spatial dependencies among factors influencing EVCS locations. To address this research gap and better understand the spatial impacts of urban activities on EVCS placement, this study presents a spatially aware machine learning (SAML) method that combines a multi-layer perceptron (MLP) model with a spatial loss function to optimize EVCS sites. Additionally, the method uses the Shapley additive explanation (SHAP) technique to investigate nonlinear relationships embedded in EVCS placement. Using the city of Wuhan as a case study, the SAML method reveals that parking site (PS), road density (RD), population density (PD), and commercial residential (CR) areas are key factors in determining optimal EVCS sites. The SAML model classifies these grid cells into no EVCS demand (0 EVCS), low EVCS demand (from 1 to 3 EVCSs), and high EVCS demand (4+ EVCSs) classes. The model performs well in predicting EVCS demand. Findings from ablation tests also indicate that the inclusion of spatial correlations in the model’s loss function significantly enhances the model’s performance. Additionally, results from case studies validate that the model is effective in predicting EVCSs in other metropolitan cities.

Keywords:

electric vehicle charging station (EVCS); site selection; multi-layer perceptron (MLP); spatial loss function; Shapley additive explanation (SHAP); urban planning

1. Introduction

Accurate prediction of the spatial patterns of public EVCS demand is essential for effective EVCS planning. Two primary approaches are used to model and predict charging demand. Microscopic approaches employ simulation software to model EV battery usage and driver charging behavior, generating detailed charging request distributions [1,2,3,4]. However, these simulation-based methods are challenging for large urban environments due to high computational complexity and resource demands. In recent years, data-driven methods supported by mobility data have emerged to predict public EVCS demand. Moreover, statistical approaches utilize geographic features, such as points of interest (POIs), traffic flow, and population density, to infer public EV charging demand [5,6,7,8]. These studies typically divide study areas into sub-areas (or grid cells) to assess EVCS demand [9,10]. They are limited in capturing complex nonlinear relationships and spatial dependencies among factors influencing EVCS locations.

To address the above limitations, we propose a spatially aware machine learning method (SAML) for estimating public EVCS demand and determining optimal sites for public EVCSs from an urban planning perspective. This method divides an area of study into grid cells, trains the relationships between urban activities and EVCS placement, and predicts EVCS demand for each cell. This method, complemented by the Shapley additive explanation (SHAP) technique, facilitates a systematic analysis of public EVCS demand across grid cells in a subject region. The contributions of our study are as follows:

(1) A novel model derived from the SAML method is developed to explore spatial distributions of EVCS demands at a given region. This model, different from statistical models, synthesizes urban activity data to develop an intelligent decision support system for the optimal placement of public EVCSs. Using the SHAP technique, the model reveals that parking site (PS), road density (RD), population density (PD), and commercial residence (CR) areas are key factors in determining optimal EVCS sites, providing insightful guidance for future public EVCS deployment.

(2) The SAML model incorporates spatial correlations of urban activity data for EVCS placement. It considers the mutual geographic distances of urban activities in the loss function, improving EVCS predictions in large urban areas. Our ablation experiments show that incorporating spatial relationships into the loss function improves the predicted performance of the model. Additionally, results from case studies validate that the model is effective in predicting EVCSs in other metropolitan cities.

The article is organized as follows: Section 2 reviews the methods and models used for EVCS placement. Section 3 describes the SAML method. Section 4 evaluates the model’s performance using prediction results, model assessment, ablation studies, and SHAP analysis. Section 5 and Section 6 present the discussion and conclusion, respectively.

2. Literature Review

Predicting optimal EVCS demand and location is a critical issue that has gained significant attention in recent years thanks to the growing popularity of EVs. A literature review was conducted in this study to capture the latest developments and insights related to optimizing EVCS demands and placements. After consolidating previous research studies, we concentrated our review on EV charging demand estimation, EVCS location strategy and infrastructure planning, algorithmic solutions, and artificial intelligence (AI)-based optimization strategies. We grouped EVCS studies by their relations to the SAML method and describe them as follows.

2.1. EV Charging Demand Estimation

Grid-based methods have drawn the great attention of researchers working on estimation of EV charging demands. These methods divide study areas into uniform grid cells and estimate EV charging demands at consistent spatial scales [11]. Machine learning/deep learning algorithms are employed to predict EVCSs across multiple spatial scales. For example, Roy and Law used a spatial scale (3 KM, with 94.9% accuracy) to partition Orange County, California, and estimate EVCS demands [12]. Their results indicate that a total of 11.04% of predicted EVCS placements in Orange County lie within a high spatial inequity zone—indicating that populations with the lowest accessibility may require the greater investments in EVCS placements. In total, 69.52% of the study area experiences moderate accessibility issues and the remaining 19.11% faces the least accessibility issues related to EV charging stations. However, this study treated the EV charging demand problem as a problem without considering spatial correlations between grid cells.

In grid-based models, data selected to represent activities within grid cells is crucial for the prediction of EV charging demands. Vazifeh et al. [10] employed individual patterns of EV movements in Boston, Massachusetts, with cellular data to estimate EVCS demand. The authors divided the city of Boston into 1 KM grid cells and developed an optimization model to minimize both the number of charging stations and travel distances using their collected data. Similarly, Wagner et al. [9] divided their study area into 250 M grid cells and identified parking spots within each grid cell based on GPS data. They prioritized EV charging sites in areas with high trip destinations. However, these studies primarily focused on EVs, charging facilities, and their direct impacts on charging demand. Therefore, additional data types, such as points of interest (POI), were not incorporated in their studies.

Dong et al. [13] explored London’s charging station deployment using extensive POI and EV charging data at a 1 KM grid scale, revealing strong correlations between EV charging demand and workplace population density, traffic flow, and transportation network, retail, and commercial categories. Wang et al. developed a heterogeneous spatiotemporal graph convolutional network that learns spatial correlations by constructing geographic and demand graphs. Their model groups regions using graph embeddings and incorporates POI data to improve demand prediction at multiple spatial scales [14]. In another work, Liu et al. introduced a five-dimensional index (population, transportation, infrastructure, land use, and economy) built from multi-source big data (including extensive POI and mobility datasets) to project spatial charging demand. This approach produces a visual model of charging station demand and helps identify areas with under-served EV charging needs [15]. Ribeiro et al. also demonstrated the importance of spatial resolution by halving the grid cell size (from 1 km² to 0.5 km²). They achieved higher granularity and significantly reduced the average distance between EV users and charging points [16]. Yang et al. proposed a transportation–power coupled network model that integrates route choice and Monte Carlo simulation to predict the spatiotemporal distributions of charging loads. Their results revealed distinct charging load patterns across different functional areas and time periods (e.g., higher demand in commercial zones during peak hours), providing a robust tool for planning EV charging infrastructure [17]. Yi et al. [7] also divided their study area into 1 KM grid cells and transformed them into directed graphs, proposing a Modified Geographic PageRank (MGPR) model based on origin-destination (OD) traffic flows and POI characteristics. The model was validated by EV charging data from 109 public stations in the Salt Lake metropolitan area. Wagner et al. [5] also proposed a linear regression model for Amsterdam using 250 M grid cells, inferring public EV charging demand through grid-level POI counts. In their study, the model’s R² value was 0.16 (a low correlation), indicating that relying solely on POI counts to predict EV charging demands may limit its predictive power. The relationships between EV charging demand and POI counts are nonlinear, thereby limiting the model’s fit and resulting in a low R² value.

2.2. EVCS Location Optimization

One of the challenges in determining public EVCS locations is to achieve objectives, such as maximizing EVCS service coverage, minimizing cost, or balancing EVCS accessibility, utilization, and other criteria under various constraints. Many EVCS location optimization models used two-step approaches to achieve pre-defined objectives by determining optimal charging station locations [18]. These models required EVCS demands to be estimated first, followed by location optimization. They include flow-capturing location models (FCLMs) [19] and models for solving maximum covering location problems (MCLP) [20]. It is noted that Dong et al. applied MCLP to maximize an EV charging demand coverage by allocating a fixed number of charging stations [13].

A more sophisticated approach was developed by Zhou et al. to formulate EVCS location optimization as a multi-objective optimization problem [21]. Additionally, Vazifeh et al. [10] considered EVCS location optimization as a set-covering problem with dual objectives. Building on this, Kınay et al. [22] developed a full coverage modeling framework with novel objective functions to optimize charging station locations and origin destination (OD) routes and minimize enroute EV charging per each trip.

In summary, previous studies have utilized grid-based approaches and various data sources to predict EV charging demand; however, issues, such as model assumptions, variable selection, and insufficient consideration of factors about urban activities, remain. Additionally, traditional two-step EV changing methods suffer from error propagation. Errors in demand prediction accumulate along with the optimization process, potentially leading to suboptimal EVCS placements. Furthermore, multi-objective optimization problems are computationally infeasible due to numerous complex constraints [21].

To address the above issues, we develop a spatially aware machine learning (SAML) model. By incorporating multidimensional data (including socio-economic factors and EV charging behavior), we combine EVCS demand estimation and location optimization together. Unlike previous EVCS methods that consider macroeconomic factors, traffic network parameters, and EV driver convenience, the SAML model partitions a region of study into grid cells and directly predicts EV changing demand within each grid cell. Inherently, EVCS sites can be automatically placed at grid cells with high EVCS charging demands.

3. Methodology

The spatially aware machine learning (SAML) method, as shown in Figure 1. The SAML model consists of four steps: First, a grid-based geographic information system (GIS) is developed using ArcGIS Pro, geodatabases for public EVCSs are constructed, and multi-source data, such as POIs, are used to represent urban activities. Second, a multi-layer perceptron (MLP) model is established using PyTorch 1.12 in Python 3.8 to identify potential EVCS hotspots. A novel loss function that integrates spatial correlations among urban activities in grid cells enhances the SAML model’s accuracy and suitability for urban environments. Third, the Shapley additive explanation (SHAP) technique is employed using the SHAP 0.41 library to analyze the impact of urban activities on EVCS placement. Fourth, the predicted EVCS hotspots are validated through case studies. All charts in this article were created using ArcGIS Pro 2.0 and Python’s Matplotlib library (version 3.4).

3.1. Research Area

The research area for this study is Wuhan (114°18′19′′ E, 30°34′34′′ N), a very important metropolitan city in China. The total area and the population of Wuhan is 8569 km² and 11.08 million, respectively, with an urbanization rate of 80.29% as of 2018. Wuhan has 13 districts, among which, 7 are downtown districts and 6 are suburban districts. Wuhan had 336,000 EVs in 2023. As the number of EVs in Wuhan grows, it is anticipated that more EVCSs are urgently needed to meet the demand of EV users. In response to the needs of EVCSs, the Wuhan metropolitan government has developed and implemented EV and EVCS policies and procedures in promoting and managing the deployment of the EVCSs. Among the various districts in Wuhan, Hongshan District (510.89 km²), as one of the largest downtown districts with a complex of educational institutions, commercial areas, and residential areas, is selected in our study for validating the SAML model.

3.2. A Grid-Based System

In this study, the city of Wuhan is partitioned spatially into 11,099 grid cells using a GIS-based grid system (see Figure 2). It should be noted that determining the size of the grid unit is an empirical process. If the unit size is set too large, it will be difficult to precisely determine the optimal location of the charging station. On the contrary, a smaller grid unit size may not capture hot spots with high charging demand. Also, it may require more intensive computations. A detailed sensitivity analysis exploring this trade-off between grid resolution and model performance is presented in Section 5.1. Each grid cell is defined in this study as a service area of 900 m × 900 m, consistent with the EVCS service radius defined by the city of Wuhan. We used the density of existing public EVCSs as a proxy label for “revealed demand”. This data was compiled from major public charging network maps (e.g., TELD, Star Charge) and the Wuhan municipal open data portal, current as of late 2023. Each grid cell was then assigned a “no demand”, “low demand”, or “high demand” label based on the number of charging poles it contained, thereby creating a ground-truth label for our model. For the grid cells with EVCS facilities (as indicated by light and dark squares in Figure 2), our focus was to identify the most important urban activities or factors influencing EVCSs in these grid cells. For the grid cells without EVCS facilities, our intention was to determine their potential EVCS demand. The SAML model further quantifies the predictive relationships between EVCS demand and these influencing factors.

3.3. Data Collection and Pre-Processing

A total of 15 independent variables, including socio-economic, land use, traffic characteristics, and EV charging behaviors relative to each grid cell, are used to describe urban activities in the city of Wuhan (see Table 1). Also, one dependent variable is used to document the EVCS facilities deployed within each grid cell. Of these variables, 11 independent variables (institution (I), scenic spot (SS), medical care (MC), leisure/entertainment (LE), government agency (GA), food service (FS), company (C), commercial residence (CR), shopping center (SC), parking site (PS), and traffic node (TN)) were considered as point of interest (POI) variables. They provide significant information about urban dynamics and help the SAML model place EVCSs. This study linked the POI attributes to grid cells using geographic information embedded in POIs (see Figure 3). Also, this study explores the impacts of urban activities on EVCS demand in grid cells. It is worth noting that a total of 185,036 valid POI data entries were prepared for the 11 independent variables. By geo-overlaying these POIs onto the grid cells, our GIS-based grid system was formed.

Four additional attributes (that is, road density (RD), population density (PD), nighttime light density (NLD), and annual power use (APU)) were also collected and embedded into the GIS-based grid system. Road density (RD), along with parking site (PS) and traffic node (TN), was used to represent traffic characteristics that impact EVCSs in grid cells. Drawing on other previous studies assessing traffic characteristics [23,24,25], this study calculated road density

r_{d_{i}}

in a grid cell as follows:

r_{d_{i}} = \frac{\sum_{j} L_{i j}}{A_{i}}

(1)

where

L_{i j}

is the length of road

j

in grid cell

i

and

A_{i}

denotes the area of grid cell

i

(

0.81 k m^{2}

). Road network data was extracted from OpenStreetMap (OSM) using Python OSMnx library. The data includes all road types classified by OSM tags (motorway, primary, secondary, tertiary, and residential roads). The length of each road was calculated using GIS technology (see Figure 4a). PS and TN were treated previously as POIs in Table 1.

Moreover, population density (PD), nighttime light density (NLD), and annual power use (APU) were used as the socio-economic indicators for EVCSs in grid cells. PD, defined by the number of residents in a grid cell, was obtained from the China’s Seventh National Population Census. GIS geo-operations were conducted to calculate the population density within a grid cell using the following equation:

ρ_{g} = (\frac{a_{s g}}{A_{s}}) \times (\frac{P_{s}}{A_{g}})

(2)

where

ρ_{g}

represents the PD of grid cell

i

.

a_{s g}

is the community area in grid cell

i

.

A_{s}

is the total area of the communities.

P_{s}

is the community population in grid cell

i

.

A_{g}

is the area of grid cell

i

. The population density (PD) is represented visually in Figure 4b.

Nighttime light density (NLD) data was obtained from the Visible Infrared Imaging Radiometer Suite (VIIRS), which resides on the Suomi National Polar-orbiting Partnership (Suomi NPP) Spacecraft. The cloud-free data, available at a 500 m resolution, was resampled to match our 900 m grid cells using bilinear interpolation in ArcGIS Pro. The annual power use (APU) data was derived from Chen et al.’s open dataset [26], which provides annual electricity consumption estimates for Chinese cities at a 1 km resolution. To ensure spatial consistency with our analysis, these values were aggregated to the 900 m grid cells using area-weighted averaging. These two data sources objectively reflect the city’s socio-economic activities [26,27,28,29], thereby indirectly measuring urban activities and EVCS demand (see Figure 4c,d).

It is worth noting that the data in support of the SAML model comes from online open sources. They cover a large area where a metropolitan-level assessment of EVCS demand is feasible. Additionally, the SAML method is transferable to other metropolitan cities. The open-source data can be repeatedly retrieved, which enable the SAML model to conduct longitudinal planning of EVCS. All spatial data processing was performed using ArcGIS Pro to ensure accuracy and consistency across datasets.

3.4. The SAML Model

Grid cells are classified in this study into three groups or classes: 0 (No Demand), 1 (Low Demand with 1–3 ECVS facilities), and 2 (High Demand with 4+ EVCS facilities). The SAML model considers the prediction of EVCSs in grid cells as a multi-classification problem.

3.4.1. Problem Definition

The prediction of EVCSs is defined by finding a mapping function:

f : R^{D} \to \{0,1, 2\}

(3)

where

R^{D}

represents a space formed by the 15 variables described in Table 1. Given our training dataset

(X, Y)

=

{(X_{1}, y 1), (X_{2}, y 2), \dots, (X_{N}, y N)}

,

X i \in R^{D}

and

y i \in {0,1, 2}

.

X

is the feature vector set of all grid cells

(i \in {1,2, \dots, N})

, with each vector denoted as

X = {X_{i 1}, X_{i 2}, X_{i 3}, \dots X_{i 15}}

.

Y

is the corresponding set of EVCS demand, with

y i

indicating EVCS demand of grid cell

i

.

It is worth noting that

f

represents an AI learning process. Our goal was to find a function

f

that minimizes the classification error, i.e., the ability to correctly classify the EVCS demand of all grid cells.

3.4.2. Partitioning of the Datasets

Within the grid-based GIS, there are three classes of grid cells: no EVCS demand (0 EVCS facility), low EVCS demand (1–3 EVCS facilities), and high EVCS demand (4+ EVCS facilities). Of 11,099 grid cells, 10,054, 780, and 265 grid cells are classified as no EVCS demand, low EVCS demand, and high EVCS demand, respectively. To construct a high-quality training dataset, we employed a stratified random sampling method to select labeled cells for training the SAML model. Specifically, 1045, 600, and 200 samples were drawn from each class, representing sampling ratios of approximately 10% (no EVCS demand), 77% (low EVCS demand), and 75% (high EVCS demand), respectively. This disproportionate sampling strategy was based on several considerations: first, the no EVCS demand class, while abundant, exhibits relatively low variability, and thus a reduced sampling ratio effectively maintains representativeness while minimizing computational overhead; second, we adopted higher sampling ratios to capture critical features and spatial patterns of EVCSs for the classes of low EVCS demand and high EVCS demand; and finally, the dataset was divided into a training set (80%) and a testing set (20%) to evaluate the model’s performance effectively.

3.4.3. Normalization of Variables

To mitigate the scale variations across different variables and enhance the model’s sensitivity to local patterns, we normalized the 15 variables. The normalized value

{X^{'}}_{i j}

for variable

i

(

i

= 1, 2, …, 15) in grid cell j is computed as

{X^{'}}_{i j} = \frac{X_{i j} - X_{m i n j}}{X_{m a x j} - X_{m i n j}}

(4)

where

X_{i j}

represents the original value of variable

i

in grid cell

j

, and

X_{m i n j}

and

X_{m a x j}

denote the minimum and maximum values among all 15 variables within grid cell

j

.

{X^{'}}_{i j}

is the normalized value bounded within [0, 1].

3.4.4. Multi-Layer Perceptron (MLP)

The foundation of our SAML model is a Multi-Layer Perceptron (MLP), a class of feedforward artificial neural network renowned for its ability to approximate complex, nonlinear functions [30]. As illustrated conceptually in Figure 5, our MLP architecture is composed of an input layer, two hidden layers, and an output layer. The input layer is designed to receive the 15 geospatial and socio-economic features for each grid cell; its number of nodes is therefore equal to the dimensionality of the input feature vector [31].

The geospatial and socio-economic features are then processed sequentially through the hidden layers. These layers contain neurons where the core computations occur, and all neurons between layers are connected through weighted connections [32,33]. Our model employs two hidden layers with 64 and 128 neurons, respectively. Each neuron applies a nonlinear transformation to its weighted inputs using a Rectified Linear Unit (ReLU) as its activation function [33]. The mathematical formalization of the forward pass is given by

H_{1} = g (W^{(1)} X + b^{(1)})

(5)

H_{2} = g (W^{(2)} H_{1} + b^{(2)})

(6)

where

X

is the input vector,

H_{1}

and

H_{2}

are the outputs of the hidden layers,

W

and

b

are the learnable weights and biases, and

g ()

represents the ReLU function.

Finally, the output layer, consisting of 3 neurons, receives the output from the final hidden layer and applies a Softmax activation function, σ(), to produce a probability distribution

P

across the three demand classes:

P = σ (W^{(o u t)} H_{2} + b^{(o u t)})

(7)

where

P \in R^{3}

represents the probability distribution over the three EVCS demand categories.

3.4.5. The Combined Loss Function

While effective for many tasks, training a standard MLP with only a conventional loss function, such as Categorical Cross-Entropy (

L_{C C E}

), reveals a critical flaw when applied to geographic data: spatial blindness. The standard

L_{C C E}

loss is defined as

L_{C C E} = - \frac{1}{N} \sum_{i = 1}^{N} \sum_{k = 1}^{3} y_{i, k} \cdot l o g (p_{i, k})

(8)

where

N

is the number of training samples,

y_{i, k}

is the one-hot encoded true label, and

p_{i, k}

is the predicted probability for class k. This loss function evaluates each grid cell independently, ignoring its surrounding context. This contradicts the fundamental principle of spatial autocorrelation, often summarized by Tobler’s First Law of Geography: “Everything is related to everything else, but near things are more related than distant things” [34]. A “spatially blind” model is prone to producing a “salt-and-pepper” effect—a prediction map with noisy, isolated predictions that are unrealistic for real-world urban planning [32].

To overcome this limitation, we introduce a spatial regularization term (

L_{s p a t i a l}

) into the model’s objective function. This term is inspired by spatial regularization techniques in computer vision [35,36] and principles of spatial statistics [37,38]. The total loss (

L_{t o t a l}

) is then defined as the weighted sum of these two components:

L_{t o t a l} = L_{C C E} + λ \cdot L_{s p a t i a l}

(9)

where λ controls the trade-off between classification accuracy and spatial smoothness. Using grid search and cross-validation, we determined λ = 0.12 to be optimal for our dataset.

The novel spatial loss term is formally defined as

L_{s p a t i a l} = \frac{1}{|E|} \sum_{(i, j) \in E} w_{i j} \cdot {∥ p_{i} - p_{j} ∥}_{2}^{2}

(10)

Here,

E

is the set of neighboring grid cell pairs (8-connectivity) and

{∥ p_{i} - p_{j} ∥}_{2}^{2}

measures the squared Euclidean distance between the predicted probability vectors of adjacent cells.

The spatial weight

w_{i j}

, defined below, ensures that closer cells exert a stronger influence:

w_{i j} = e x p (- \frac{d_{i j}}{σ})

(11)

where

d_{i j}

is the Euclidean distance between cell centroids (in kilometers) and σ = 1.0 km is a distance decay parameter.

During back-propagation, the spatial loss term contributes to the gradient with respect to the prediction

p_{i}

as follows:

\frac{\partial L_{s p a t i a l}}{\partial p_{i}} = \frac{2}{|E|} \sum_{j \in N (i)} w_{i j} \cdot (p_{i} - p_{j})

(12)

where

N (i)

is the set of neighbors of cell

i

. This gradient effectively “pulls” a cell’s prediction vector towards the average of its neighbors’ predictions, weighted by spatial proximity. This mechanism enforces spatial smoothness in the model’s output, creating a more realistic and coherent demand surface while still being driven by the classification accuracy objective of the

L_{C C E}

term.

3.4.6. Model Training and Implementation

The SAML model is trained through an iterative optimization process designed to find the parameters (θ) that best map the input features to the demand labels by minimizing the total loss function defined in Section 3.4.5. This process, formally detailed in Algorithm 1, follows four key steps for each batch of data.

Algorithm 1 Procedure for the SAML Model Training

// Input: Training features X_train, labels Y_train, grid indices I_train

// Adjacency information A

// Hyperparameters: η (learning rate), B (batch size),

// N_epochs (epochs), λ (spatial weight), σ (decay)

// Output: Trained SAML model with optimized parameters θ

Initialize MLP parameters θ = {W, b}

Initialize Adam optimizer with learning rate η

for epoch = 1 to N_epochs do

Shuffle {X_train, Y_train, I_train}

Divide data into batches of size B

for each batch {X_batch, Y_batch, I_batch} do

// Forward Pass

P_pred ← MLPForward(X_batch, θ)

// Calculate Loss Components

L_cce ← ComputeCrossEntropyLoss(P_pred, Y_batch)

L_spatial ← ComputeSpatialLoss(P_pred, I_batch, A, σ)

// Compute Total Loss

L_total ← L_cce + λ · L_spatial

// Backward Pass

g ← ∇_θ L_total // via backpropagation

// Update Parameters

θ ← AdamUpdate(θ, g, η)

end for

return θ

Step 1: Forward Pass: A batch of input features is passed through the MLP network, as defined by the equations in Section 3.4.4. This generates a probability prediction vector, P, for each grid cell in the batch. Each vector contains the model’s confidence that the corresponding grid cell belongs to the “no demand,” “low demand,” or “high demand” class.

Step 2: Loss Calculation: The model’s predictions,

P

, are then compared to the true labels by calculating the combined loss function,

L_{t o t a l}

. This value quantifies the model’s total error, which is composed of two distinct parts:

The Categorical Cross-Entropy Loss (

L_{C C E}

), which measures the point-wise classification inaccuracy.

The Spatial Loss (

L_{s p a t i a l}

), which measures the model’s spatial inconsistency by penalizing predictions that differ significantly from their neighbors.

Step 3: Backward Pass: The gradient of the total loss (∇θ

L_{t o t a l}

) is then computed with respect to every weight and bias parameter in the network via the back-propagation algorithm. Because our loss function is a composite, this gradient is a powerful signal, containing integrated error information derived from both the classification objective and the spatial coherence objective.

Step 4: Parameter Update: The Adam optimizer uses the computed gradient to intelligently and adaptively adjust all of the model’s parameters. This update moves the parameters in a direction that minimizes the total loss, simultaneously encouraging the model to become more accurate and more spatially aware.

This four-step cycle is repeated for multiple epochs until the model’s loss on a validation set converges. The specific hyperparameters used for this process (e.g., learning rate, batch size, λ) were determined through 5-fold cross-validation, and are detailed in Table 2. The entire framework was implemented in Python using the PyTorch library, and the training procedure is formally outlined in Algorithm 1.

3.5. Performance Evaluation

To assess the performance of the SAML model in predicting the EVCS demand at grid cells, four metrics (recall, precision, F1-score, and accuracy) were employed and defined as follows:

\begin{matrix} {R e c a l l}_{i} & = \frac{T P_{i}}{{R e l e v a n t}_{i}} \\ {P r e c i s i o n}_{i} & = \frac{T P_{i}}{{R e t r i e v e d}_{i}} \\ F_{1} {- s c o r e}_{i} & = 2 * \frac{{R e c a l l}_{i} \times {P r e c i s i o n}_{i}}{{R e c a l l}_{i} + {P r e c i s i o n}_{i}} \\ {A c c u r a c y}_{i} & = \frac{C o r r e c t e d p r e d i c t i o n s}{A l l p r e d i c t i o n s} \end{matrix}

(13)

where

T P_{i}

is the number of True Positive cases (or the correct predictions of EVCSs) for class

i

(i = 0, 1, 2, where 0 represents no demand for EVCS, 1 represents low demand for EVCS, and 2 represents high demand for EVCS).

{R e l e v a n t}_{i}

is the total number of actual cases for class

i

, representing all grid cells that fall under class

i

, indicating whether there is no EVCS demand (0), low EVCS demand (1), or high EVCS demand (2).

{R e t r i e v e d}_{i}

is the number of cases retrieved by the SAML model relative to class

i

, whether it correctly predicts the specific EVCS demand level or misclassifies it. Accuracy measures the proportion of all predictions that are correct across all labels, combining both True Positives and True Negatives for each class, reflecting the SAML model’s overall ability to correctly classify grid cells into no demand, low demand, or high demand classes for EVCS. In addition to the above indicators, ROC curves were also drawn for the three classes to further measure the performance of the SAML model.

3.6. Shapley Additive Explanations (SHAP)

The SAML model, like any other MLP model, is a black box which cannot reveal how independent variables influence the predictions of EVCSs due to the complexity of its layered and nonlinear process. To overcome this challenge, this study employs the Shapley additive explanations (SHAP) technique to elucidate the specific contributions of the 15 attributes (or variables) to the EVCS demand. The SHAP approach, rooted in the coalitional game theory, calculates Shapley values and provides a detailed and interpretable analysis of feature or variable importance [39,40]. This technique effectively quantifies the marginal contribution of each attribute, ensuring that their cumulative impact aligns with the SAML model’s predictions.

Assume a prediction model is

f

and an input sample

x

, the SHAP value

ϕ i

for each feature

i

can be defined as

ϕ_{i} = \sum_{S \subseteq N ∖ {i}} \frac{| S |! (| N | - | S | - 1)!}{| N |!} [f_{x} (S \cup {i}) - f_{x} (S)]

(14)

where

N

is the set of all variables = {1, 2, 3, … 15},

S

is a subset that does not include variable

i

,

| S |

is the number of elements in set

S

, and

f_{x} (S)

is the model

f

prediction output when the variable set

S

appears in the sample.

f_{x} (S)

represents the model prediction when only variables in subset

S

are present, while

f_{x} (S \cup \{i\})

represents the model prediction after adding variable

i

to subset

S

. The difference

[f_{x} (S \cup {i}) - f_{x} (S)]

quantifies the marginal contribution of variable

i

when added to the specific variable combination

S

.

4. Results

After the SAML model is trained, this study applied the model to 9254 remaining grid cells and predicted the potential EVCSs for each grid cell. Below is an analysis of the EVCS predictions across the city of Wuhan.

4.1. Prediction Result

Using the SAML model and GIS technology, we categorized EVCSs into three transition phases: ΔDemand₀₂ (from no EVCS demand to high EVCS demand), ΔDemand₁₂ (from low EVCS demand to high EVCS demand), and ΔDemand₀₁ (from no EVCS demand to low EVCS demand), respectively.

ΔDemand₀₂: A total of 20 grid cells currently with no EVCS demand are predicted to have high EVCS demand (see Figure 6). Based on urban activity patterns, these areas show high potential for EVCS deployment. It is noted that these grid cells are predominantly concentrated within the Third Ring Road of Wuhan. They are characterized by intense urban activities, high population density, and excellent accessibility to urban services. Currently, these grid cells do not have any EVCS facilities.

ΔDemand₁₂: A total of 41 grid cells with low EVCS demand are predicted to have high EVCS demand (see Figure 7). These grid cells, also mainly distributed within the Third Ring Road of Wuhan, are already installed in 1–3 EVCSs, and are projected to evolve into high-demand zones with 4+ EVCSs. As EV owners surge, significant upgrades or expansions will likely be required to keep pace with growing EV demand.

ΔDemand₀₁: A total of 843 grid cells with no EVCS demand are predicted to have a low need of EVCS facilities (see Figure 8). These grid cells are located outward from the central areas of the city, indicating that the demand for deploying EVCSs moves from the central areas to the suburban areas of the city. By strategically placing EVCSs in these grid cells, the city of Wuhan can meet growing EVCS demand without over-committing resources, providing a scalable solution for the gradual integration of EVs into the city’s urban transportation network. Currently, these grid cells do not have any EVCS facilities.

4.2. Model Experiment and Accuracy

The receiver operating characteristic (ROC) curves and the area under the curve (AUC) were generated as metrics to assess the effectiveness of the SAML model. In the ROC curves, the True Positive Rate (TPR) is plotted on the y-axis and the False Positive Rate (FPR) on the x-axis. The ideal point on these curves would be at the upper-left corner, where FPR is zero and TPR is one. Although achieving this ideal scenario is rare, the aim is to maximize the AUC. A higher AUC generally indicates superior model performance. The shapes of the ROC curves are also important. A steeper curve indicates that the SAML model has a higher TPR and a lower FPR, which is desirable. The AUC quantifies the area under its curve, with a maximum value of 1 representing perfect classification. The AUC values for the three classes of EVCSs (no EVCS demand, low EVCS demand, and high EVCS demand) are 0.96, 0.85, and 0.91, respectively (see Figure 9), AUC values between 0.9 and 1 indicate excellent prediction accuracy, while values between 0.8 and 0.9 reflect very good performance [41].

The detailed performance metrics for different classes of EVCS demand further illustrate the SAML model’s capabilities (see Table 3). For the grid cells with No EVCS Demand (class 0), the SAML model achieves precision (0.88), recall (0.92), and F1-score (0.90). For the grid cells with Low EVCS demand (class 1), the SAML model achieves precision (0.71), recall (0.73), and F1-score (0.72). For the grid cells with High EVCS demand (class 2), the model achieves precision (0.60), recall (0.62), and F1-score (0.61). The SAML model achieves overall accuracy (0.81).

4.3. Ablation Tests

Ablation tests were conducted in this study to evaluate the effectiveness of integrating spatial distance into the loss function of the SAML model, with a focus on its impact on the model’s accuracy and generalization across various grid cells for predicting EVCS demand. Two configurations were considered in the ablation tests. These configurations were the SAML model with a geodesic distance matrix embedded into the loss function and the SAML baseline model employing the standard categorical cross-entropy loss function alone (or without the geodesic distance matrix embedded). Both configurations were trained using the same dataset and the identical hyperparameters. Performance metrics, such as accuracy, precision, recall, and F1-score, were utilized to assess the predictions of EVCS demand.

The results from the ablation tests reveal that the inclusion of spatial distance or spatial correlations in the loss function significantly enhances the model’s performance. Precision improvements ranged from 0.52 to 0.60 for High EVCS demand, from 0.70 to 0.71 for Low EVCS demand, and from 0.82 to 0.88 for No EVCS Demand (see Figure 10a). Additionally, the recall values indicate that the SAML model, as compared with the baseline model, experienced an improvement from 0.60 to 0.62 for High EVCS demand, from 0.63 to 0.73 for Low EVCS demand, and from 0.90 to 0.92 for No EVCS Demand (see Figure 10b). Furthermore, the F1-score rose from 0.56 to 0.61 for High EVCS demand, from 0.66 to 0.73 for Low EVCS demand, and from 0.86 to 0.90 for No EVCS Demand (see Figure 10c). Also, the overall accuracy of the model improved from 0.78 to 0.81 (see Figure 10d), illustrating the efficacy of spatial distance integration in enhancing EVCS predictions.

4.4. Contributions of Urban Activities to EVCS Demand

After the SAML model was trained, we used the SHAP technique to evaluate the contributions of independent variables (such as socioeconomic factors, land use, and travel patterns of electric vehicle drivers) to the predictions of EVCS demand. Notably, the higher the mean SHAP value, the greater the contribution of an independent variable to the prediction of EVCS demand. Figure 11 shows the mean SHAP value of a single variable on the output amplitude of the SAML model, with the blue and yellow colors representing the grid cells with Low EVCS demand and High EVCS demand, respectively. It is noted that the PS, PD, NLD, PD, CR, and RD variables are observed to have a high SHAP value, thus a high impact on EV charging demand, while other variables have relatively small impacts (Figure 11).

Figure 12a,b shows SHAP summary plots for the two classes (Low EVCS demand and High EVCS demand), illustrating the overall impact of each independent variable on EVCS demand. The independent variables are listed on the y-axis in descending order of their impact, while the SHAP values are plotted on the x-axis. In the Low EVCS demand grid cells, road density (RD), nighttime light density (NLD), commercial residence (CR), shopping center (SC), and government agency (GA) are observed to have positive impacts on EVCSs. Interestingly, parking site (PS) and population density (PD) have significant negative impacts on EVCSs (see Figure 12a). As to the High EVCS demand grid cells, parking site (PS), road density (RD), population density (PD), and commercial residence (CR) are observed to be the key factors driving EVCSs (See Figure 12b).

5. Discussion

This section presents a sensitivity analysis to justify the selected grid resolution and describes a detailed case study to further validate the accuracy of the SAML model’s predictions. Additionally, a discussion of the model’s policy implications, its potential for transferability to other urban contexts, and the study’s limitations and directions for future research is provided.

5.1. Sensitivity Analysis of Grid Resolution

The selection of an appropriate spatial resolution is a critical step in geospatial modeling, involving a trade-off between capturing fine-grained spatial heterogeneity and maintaining computational feasibility. To determine the optimal scale for our analysis, we conducted a sensitivity analysis on four candidate grid resolutions: 300 m, 500 m, 900 m, and 1 km.

Finer resolutions, such as 300 m (95,401 cells) and 500 m (34,329 cells), were evaluated first. While the 300 m grid offered the highest level of detail, it was computationally prohibitive, with an input dimension over 11 times larger than the 900 m grid, causing memory overflow on a 24GB GPU. Similarly, the 500 m grid, while less intensive, still posed a significant computational burden, especially given our model’s need to encode a spatial relationship matrix. Conversely, a 1 km coarser resolution (8547 cells), while computationally efficient, resulted in a loss of spatial detail. This over-aggregation caused small but significant high-demand areas to be merged with neighboring low-demand zones, leading to a degradation in model performance as reflected by a lower F1 score.

The 900 m grid (11,099 cells) emerged as the optimal choice, providing the best balance between these competing factors. This resolution was computationally manageable and achieved high predictive accuracy, yielding an F1 score of 0.83 and an AUC of 0.91. Crucially, this choice is further validated by its alignment with local planning standards; the 900 m grid is the finest operational scale used by the Wuhan Municipal Transportation Administration. Therefore, any finer resolution, while academically interesting, would not translate into more actionable planning decisions. Based on this analysis of both model performance and practical relevance, the 900 m grid resolution was adopted for all subsequent experiments.

5.2. Case Study for Model Validation

A grid cell of ΔDemand₀₂ from Figure 6 was selected for in-depth analysis of the SAML model’s performance. As shown in Figure 13, the subject grid cell has four parking sites with 270 parking spaces (PS = 4), a road density of 5.33 km/km² (RD = 5.33), a population density of 19,014 person/km² (PD = 19,014), and a commercial residence of 10 (CR = 10). The SHAP force graph in Figure 13 indicates that PS, RD, PD, and CR are the four key factors, predicting that this subject grid cell needs 4+ EVCS facilities. In the SHAP force graph, the size of the variable’s impact is represented by the length of the bar, and the directions are distinguished by different colors. The variables in red provide positive impact and push up the predicted value, while the variables in blue offer negative contributions and push down the predicted value.

A stratified sampling method was used, and a structured survey of 50 EV users within this grid cell was conducted. The EV users, representing diverse groups, include ride-hailing drivers, local EV owners, commercial vehicle operators, and commuters, ensuring representative feedback across different EV usage patterns and charging needs. The survey was conducted across various time periods on both weekdays and weekends to capture temporal variations in demand. This survey adopts a three-point Likert scale (high demand (3), moderate demand (2), and low demand (1)) to assess EVCSs (see Appendix A). The questionnaires were validated with Cronbach’s α > 0.85, indicating their high reliability. The results from t-tests (t (49) = 8.48, p < 0.001) confirm that the grid cell experiences a high EVCS demand. These findings confirm the ΔDemand₀₂ was predicted by the model.

Additionally, the survey also has another three-point Likert scale (high impact (3), moderate impact (2), and low impact (1)) to assess the impacts of factors (or variables) on EVCSs. One-way analysis of variance reveals that the importance scores of the four factors (parking site (PS), population density (PD), road density (RD), and commercial residence (CR)) are significantly higher than other factors (F (4, 245) = 27.865, p < 0.001). The statistical results also indicate that parking site (PS) received the highest impact, followed by population density (PD), road density (RD), and commercial residence (CR), which are surprisingly consistent with the results of the SHAP technique.

5.3. Future Policy Implications

Our findings reveal that the 20 ΔDemand₀₂ grid cells are predominantly concentrated within the central districts of Wuhan. These grid cells are effectively impacted by parking site (PS), population density (PD), road density (RD), and commercial residence (CR). Obviously, this observation is significantly different from the impact patterns in Europe and North America, where high-demand EVCSs are primarily located at transportation hubs and highway exits [42,43,44].

Parking site (PS), emerging as the most influential factor in driving ΔDemand₀₂, indicates that existing parking infrastructure serves as natural locations for EVCS deployment [45]. For instance, Wuhan’s recent planning guidelines, requiring 20% of parking spaces in new developments to be equipped with EVCSs, align well with our model predictions. City planners can utilize these ΔDemand₀₂ grid cells to identify optimal parking facilities for EVCS placement.

Population density (PD) acts as the second most significant factor triggering ECVS deployment. In Chinese cities, high-rise residential areas with high population densities often lack private parking and charging facilities. Our findings recommend that policymakers should prioritize public EVCS deployment in areas with high-rise communities, particularly where renovations are being planned to transform old communities into new communities with high rise buildings.

Road density (RD) also impacts EVCS placement. Unlike traditional highway-oriented EVCS deployment at ramp exits [46], our findings support EVCS placement in areas with high road density. Adding EVCSs or converting existing gas stations to EVCS within high-road-density grid cells could be a good action to address EVCSs.

Moreover, commercial-residence (CR) emerges as the fourth crucial factor for EVCS placement. The CR zones, combining residential and commercial functions, experience intensive urban activities, generating substantial EV charging demand. Such zones attract both EV residents and commuting workers. As a suggestion, policymakers should prioritize EVCS deployment in parking facilities within CR new developments and redevelopments.

It is noted that PS, PD, RD, and CR play a similar role in the 41 ΔDemand₁₂ grid cells. These grid cells, also concentrated within the central districts, are installed and operated with 1–3 EVCS sites. However, the existing EVCSs cannot accommodate high EV charging demand. Adding more EVCSs to existing ECVS sites and creating new EVCS sites within these grid cells should be given high priorities.

The results from the SAML model also indicate that there are 843 ΔDemand₀₁ grid cells. Since there are not any EVCS sites available in these grid cells, the EVCS deployment should consider the driving factors of road density (RD), nighttime light density (NLD), commercial residence (CR), shopping center (SC), and government agency (GA), as identified by the SAML model. It is noted from Figure 8 that the ΔDemand₀₁ grid cells spread over central urban and suburban regions where road density is high. These grid cells are primarily located along with major national and provincial highways leading to shopping centers, governmental agencies, commercial residences, and areas with high nighttime light density. Interestingly, parking site (PS) is not a key factor in these ΔDemand₀₁ grid cells. This finding indicates that EVCS sites should be placed within these grid cells to align with traffic patterns, not parking sites.

It is important to note that EVCS deployment is a resource-intensive process that requires significant capital investment and long-term infrastructure planning. To mitigate the risks associated with potential misclassification or demand uncertainty, we recommend that policymakers adopt a phased deployment strategy—starting with small-scale pilot installations in targeted areas. In addition, we suggest that decision-makers conduct localized user surveys or demand assessments before large-scale deployment, ensuring that infrastructure investment aligns with genuine user needs. Such measures can help validate model predictions, reduce the risk of misallocation, and enhance the overall effectiveness of EVCS deployment strategies.

5.4. Model Transferability

The SAML model is designed for high transferability, as its methodology relies on globally accessible and periodically updated open-source datasets (e.g., OpenStreetMap and census data). Its data-driven MLP core ensures adaptability, allowing for the model to automatically learn the unique relationships between urban features and demand in any given city, as well as to support longitudinal analysis of demand evolution over time. However, applying the model to a new urban environment requires two crucial steps to ensure local relevance and accuracy. First, the optimal grid resolution must be determined for the new context, following a sensitivity analysis similar to the one presented for Wuhan in Section 5.1. Second, and most critically, the model must be retrained using the new city’s local data. This is because the proxy for demand (the distribution of existing EVCSs) and the learned relationships between features and demand are inherently context-dependent. Therefore, while the SAML model provides a generalizable and replicable approach, the resulting predictive model is always tailored to the specific city on which it is trained.

5.5. Limitations and Future Research

There are three promising future directions to address the limitations of the SAML method. First, the SAML model captures potential EVCS hotspots or demands by selecting existing grid cells with EVCS sites. However, in these grid cells, not all existing EVCS sites are optimally located. Future research, therefore, can be focused on developing new methods to obtain more balanced labeled datasets of EVCSs (instead of EVCS sites) for model training. According to the prediction results of the SAML model, this study selected a sample grid cell in Hongshan District and verified the EVCSs and the key factors driving the EVCS deployment using 50 questionnaires. However, the sample size may not fully represent the distribution of charging demand in the subject grid cell. Additionally, while the nighttime light data utilized in this study offers global coverage and a strong correlation with economic activity, it is subject to inherent limitations, such as its native spatial resolution and the well-documented characteristics of potential signal “blooming” (light spillover) and saturation in the brightest city centers. These factors might impact the model’s accuracy [47]. Third, due to data constraints, this study focuses solely on EVCS placement without addressing other constraints such as power network and capacities, EV movement patterns, etc. Future research would benefit from incorporating other datasets including power grids, EV trajectories, and temporal demand variations. Once suitable datasets are obtained for model training, the expanded model will provide more precise information for the planning of public EVCSs, thus enhancing the practicality of the model. Furthermore, to rigorously evaluate its relative performance, the SAML framework should be benchmarked against other established approaches, such as traditional gravity models or alternative machine learning algorithms in future work.

6. Conclusions

This study develops an innovative spatially aware machine learning (SAML) method for EVCS placement. Using the collected data on socio-economic characteristics, EV charging behaviors, and other urban activities, we determine the key factors driving the model’s prediction of EVCS location. Partitioning the city of Wuhan into 11,099 grid cells and applied the SAML model for the grid cells, we predict that 20 of these grid cells (currently not equipped with EVCS sites) and 41 grid cells (currently equipped 1–3 EVCS sites) show high EVCS deployment potential. Moreover, a total of 843 grid cells without EVCSs are predicted to have a low need of EVCS facilities (that is, 1–3 EVCS sites).

The SAML model incorporates the SHAP technique and a spatial loss function to optimize EVCS sites. Using the SHAP technique, we investigated the nonlinear relationships embedded in EVCS placement and reveal that parking site (PS), road density (RD), population density (PD), and commercial residential (CR) areas are key factors in determining optimal EVCS sites. Our ablation tests demonstrate that the inclusion of spatial distance or spatial correlations in the loss function significantly enhance the model’s performance. Additionally, results from case studies validate that the model provides an effective tool for spatial identification of EVCSs in other metropolitan cities.

The SAML model classifies grid cells by no EVCS demand (0 EVCS), low EVCS demand (from 1 to 3 EVCSs), and high EVCS demand (4+ EVCSs). The model achieves overall accuracy (0.81), with precision (0.88), recall (0.92), and F1-score (0.90) for grid cells with no EVCS demand, precision (0.71), recall (0.73), and F1-score (0.72) for grid cells with low EVCS demand, and precision (0.60), recall (0.62), and F1-score (0.61) for grid cells with high EVCS demand.

Author Contributions

Conceptualization, H.R., X.J. and X.Y.; methodology, H.R. and X.J.; software, H.R.; validation, H.R., Y.H. and D.X.; formal analysis, Y.Y.; investigation, Y.Y.; resources, Y.H.; data curation, Y.Z.; writing—original draft preparation, H.R. and X.J.; writing—review and editing, X.J. and Y.H.; visualization, X.Y. and D.C.; supervision, Y.H. and X.J.; project administration, Y.H. and X.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Ministry of Science and Technology of the People’s Republic of China under the National Foreign Experts Project: “Research on the Planning and Layout of Electric Vehicle Charging Facilities,” grant number G2023027008L.

Institutional Review Board Statement

This study was conducted in accordance with the Declaration of Helsinki, and was approved by the Institutional Review Board of Hubei University of Technology (HBUT20250029 31/07/2025).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The data is available upon request to the authors.

Acknowledgments

We gratefully acknowledge Hubei University of Technology for providing the instruments for testing, and we also express our gratitude to Wuyi University for their valuable assistance in the development of the model. All individuals acknowledged in this section have given their explicit consent to be mentioned.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

Nomenclature	Description
EV	Electric vehicle
EVCS	Electric Vehicle Charging Stations
I	Institution
SS	Scenic Spot
MC	Medical Care
LE	Leisure/Entertainment
GA	Government Agency
FS	Food Service
C	Company
CR	Commercial Residence
SC	Shopping Center
PS	Parking Sites
TN	Traffic Node
RD	Road Density
PD	Population Density
NLD	Nighttime Light Density
APU	Annual Power Use
SHAP	SHapley Additive exPlanation
MLP	Multi-layer Perceptron
OD	Origin-Destination
MGPR	Modified Geographic PageRank
NZE	Net-zero energy
PV	Photovoltaic
GIS	Geographic Information System
SAML	spatially aware machine learning method
M	Mean
SD	Standard Deviation
t	t-statistic
p	p-value
F	F (ANOVA Statistic)
ReLU	Rectified Linear Unit

Appendix A

Survey instrument for electric vehicle charging infrastructure demand assessment.
Please evaluate the impact level of the following factors on charging demand in your area.
(3 = High impact, 2 = Moderate impact, 1 = Low impact).

No.	Factor	Impact Level
1	Institution (I)	□3 □2 □1
2	Scenic Spot (SS)	□3 □2 □1
3	Medical Care (MC)	□3 □2 □1
4	Leisure/Entertainment (LE)	□3 □2 □1
5	Government Agency (GA)	□3 □2 □1
6	Food Service (FS)	□3 □2 □1
7	Company (C)	□3 □2 □1
8	Commercial Residence (CR)	□3 □2 □1
9	Shopping Center (SC)	□3 □2 □1
10	Parking Sites (PS)	□3 □2 □1
11	Traffic Node (TN)	□3 □2 □1
12	Road Density (RD)	□3 □2 □1
13	Population Density (PD)	□3 □2 □1
14	Nighttime Light Density (NLD)	□3 □2 □1
15	Annual Power Use (APU)	□3 □2 □1

Overall Assessment:
16. How would you rate the overall charging demand in this area?
□ High (3) □ Moderate (2) □ Low (1)
Note: This survey adopts a three-point Likert scale to assess the impact level of various factors on EV charging demand. The questionnaire has been validated with Cronbach’s α > 0.85, indicating high reliability. Statistical analyses, including t-tests and ANOVA, were performed to ensure validity.

References

Adenaw, L.; Lienkamp, M. Multi-Criteria, Co-Evolutionary Charging Behavior: An Agent-Based Simulation of Urban Electromobility. World Electr. Veh. J. 2021, 12, 18. [Google Scholar] [CrossRef]
Lopez, N.S.; Allana, A.; Biona, J.B.M. Modeling Electric Vehicle Charging Demand with the Effect of Increasing EVSEs: A Discrete Event Simulation-Based Model. Energies 2021, 14, 3734. [Google Scholar] [CrossRef]
Xing, Q.; Chen, Z.; Zhang, Z.; Xu, X.; Zhang, T.; Huang, X.; Wang, H. Urban Electric Vehicle Fast-Charging Demand Forecasting Model Based on Data-Driven Approach and Human Decision-Making Behavior. Energies 2020, 13, 1412. [Google Scholar] [CrossRef]
Yi, Z.; Chen, B.; Liu, X.C.; Wei, R.; Chen, J.; Chen, Z. An Agent-Based Modeling Approach for Public Charging Demand Estimation and Charging Station Location Optimization at Urban Scale. Comput. Environ. Urban Syst. 2023, 101, 101949. [Google Scholar] [CrossRef]
Wagner, S.; Götzinger, M.; Neumann, D. OPTIMAL LOCATION OF CHARGING STATIONS IN SMART CITIES: A POINT OF INTEREST BASED APPROACH. In Proceedings of the International Conference on Information Systems (ICIS 2013): Reshaping Society Through Information Systems Design, Milan, Italy, 15–18 December 2013; Volume 3, pp. 2838–2855. [Google Scholar]
Yi, Z.; Liu, X.C.; Wei, R.; Chen, X.; Dai, J. Electric Vehicle Charging Demand Forecasting Using Deep Learning Model. J. Intell. Transp. Syst. 2022, 26, 690–703. [Google Scholar] [CrossRef]
Yi, Z.; Liu, X.C.; Wei, R. Electric Vehicle Demand Estimation and Charging Station Allocation Using Urban Informatics. Transp. Res. Part D Transp. Environ. 2022, 106, 103264. [Google Scholar] [CrossRef]
Mortimer, B.J.; Hecht, C.; Goldbeck, R.; Sauer, D.U.; De Doncker, R.W. Electric Vehicle Public Charging Infrastructure Planning Using Real-World Charging Data. World Electr. Veh. J. 2022, 13, 94. [Google Scholar] [CrossRef]
Kontou, E.; Liu, C.; Xie, F.; Wu, X.; Lin, Z. Understanding the Linkage between Electric Vehicle Charging Network Coverage and Charging Opportunity Using GPS Travel Data. Transp. Res. Part C Emerg. Technol. 2019, 98, 1–13. [Google Scholar] [CrossRef]
Vazifeh, M.M.; Zhang, H.; Santi, P.; Ratti, C. Optimizing the Deployment of Electric Vehicle Charging Stations Using Pervasive Mobility Data. Transp. Res. Part A Policy Pract. 2019, 121, 75–91. [Google Scholar] [CrossRef]
Shuai, C.; Zhang, X.; Ouyang, X.; Liu, K.; Yang, Y. Research on Charging Demands of Commercial Electric Vehicles Based on Voronoi Diagram and Spatial Econometrics Model: An Empirical Study in Chongqing China. Sustain. Cities Soc. 2024, 105, 105335. [Google Scholar] [CrossRef]
Roy, A.; Law, M. Examining Spatial Disparities in Electric Vehicle Charging Station Placements Using Machine Learning. Sustain. Cities Soc. 2022, 83, 103978. [Google Scholar] [CrossRef]
Dong, G.; Ma, J.; Wei, R.; Haycox, J. Electric Vehicle Charging Point Placement Optimisation by Exploiting Spatial Statistics and Maximal Coverage Location Models. Transp. Res. Part D Transp. Environ. 2019, 67, 77–88. [Google Scholar] [CrossRef]
Wang, S.; Chen, A.; Wang, P.; Zhuge, C. Predicting Electric Vehicle Charging Demand Using a Heterogeneous Spatio-Temporal Graph Convolutional Network. Transp. Res. Part C Emerg. Technol. 2023, 153, 104205. [Google Scholar] [CrossRef]
Ren, Q.; Sun, M. Predicting the Spatial Demand for Public Charging Stations for EVs Using Multi-Source Big Data: An Example from Jinan City, China. Sci. Rep. 2025, 15, 6991. [Google Scholar] [CrossRef]
Mutua, A.M.; De Fréin, R. Sustainable Mobility: Machine Learning-Driven Deployment of EV Charging Points in Dublin. Sustainability 2024, 16, 9950. [Google Scholar] [CrossRef]
Yang, X.; Yun, J.; Zhou, S.; Lie, T.T.; Han, J.; Xu, X.; Wang, Q.; Ge, Z. A Spatiotemporal Distribution Prediction Model for Electric Vehicles Charging Load in Transportation Power Coupled Network. Sci. Rep. 2025, 15, 4022. [Google Scholar] [CrossRef]
Huo, H.; Cai, H.; Zhang, Q.; Liu, F.; He, K. Life-Cycle Assessment of Greenhouse Gas and Air Emissions of Electric Vehicles: A Comparison between China and the U.S. Atmos. Environ. 2015, 108, 107–116. [Google Scholar] [CrossRef]
Hodgson, M.J. A Flow-Capturing Location-Allocation Model. Geogr. Anal. 1990, 22, 270–279. [Google Scholar] [CrossRef]
Church, R.; Velle, C.R. The maximal covering location problem. Pap. Reg. Sci. 1974, 32, 101–118. [Google Scholar] [CrossRef]
Zhou, Y.; Liu, X.C.; Wei, R.; Golub, A. Bi-Objective Optimization for Battery Electric Bus Deployment Considering Cost and Environmental Equity. IEEE Trans. Intell. Transport. Syst. 2021, 22, 2487–2497. [Google Scholar] [CrossRef]
Kınay, Ö.B.; Gzara, F.; Alumur, S.A. Full Cover Charging Station Location Problem with Routing. Transp. Res. Part B Methodol. 2021, 144, 1–22. [Google Scholar] [CrossRef]
Zhang, L.; Hong, J.; Nasri, A.; Shen, Q. How Built Environment Affects Travel Behavior: A Comparative Analysis of the Connections between Land Use and Vehicle Miles Traveled in US Cities. J. Transp. Land Use 2012, 5, 40–52. [Google Scholar] [CrossRef]
Liu, X.; Sun, L.; Sun, Q.; Gao, G. Spatial Variation of Taxi Demand Using GPS Trajectories and POI Data. J. Adv. Transp. 2020, 2020, 1–20. [Google Scholar] [CrossRef]
Mwale, M.; Luke, R.; Pisa, N. Factors That Affect Travel Behaviour in Developing Cities: A Methodological Review. Transp. Res. Interdiscip. Perspect. 2022, 16, 100683. [Google Scholar] [CrossRef]
Chen, J.; Gao, M.; Cheng, S.; Hou, W.; Song, M.; Liu, X.; Liu, Y. Global 1 Km × 1 Km Gridded Revised Real Gross Domestic Product and Electricity Consumption during 1992–2019 Based on Calibrated Nighttime Light Data. Sci. Data 2022, 9, 202. [Google Scholar] [CrossRef]
Chen, Z.; Wei, Y.; Shi, K.; Zhao, Z.; Wang, C.; Wu, B.; Qiu, B.; Yu, B. The Potential of Nighttime Light Remote Sensing Data to Evaluate the Development of Digital Economy: A Case Study of China at the City Level. Comput. Environ. Urban Syst. 2022, 92, 101749. [Google Scholar] [CrossRef]
Long, P.D.; Ngoc, B.H.; My, D.T.H. The Relationship between Foreign Direct Investment, Electricity Consumption and Economic Growth in Vietnam. Int. J. Energy Econ. Policy 2018, 8, 267–274. [Google Scholar]
Su, L.; Jia, J. The Relationship between Nighttime Light Intensity and GDP in Shanghai Districts. J. Comput. Methods Sci. Eng. 2023, 23, 3–8. [Google Scholar] [CrossRef]
Hornik, K.; Stinchcombe, M.; White, H. Multilayer Feedforward Networks Are Universal Approximators. Neural Netw. 1989, 2, 359–366. [Google Scholar] [CrossRef]
Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-Based Learning Applied to Document Recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
Yuan, Q.; Shen, H.; Li, T.; Li, Z.; Li, S.; Jiang, Y.; Xu, H.; Tan, W.; Yang, Q.; Wang, J.; et al. Deep Learning in Environmental Remote Sensing: Achievements and Challenges. Remote Sens. Environ. 2020, 241, 111716. [Google Scholar] [CrossRef]
Alzubaidi, L.; Zhang, J.; Humaidi, A.J.; Al-Dujaili, A.; Duan, Y.; Al-Shamma, O.; Santamaría, J.; Fadhel, M.A.; Al-Amidie, M.; Farhan, L. Review of Deep Learning: Concepts, CNN Architectures, Challenges, Applications, Future Directions. J. Big Data 2021, 8, 53. [Google Scholar] [CrossRef]
Tobler, W.R. A Computer Movie Simulating Urban Growth in the Detroit Region. Econ. Geogr. 1970, 46, 234. [Google Scholar] [CrossRef]
Chang, M.; Li, Q.; Feng, H.; Xu, Z. Spatial-Adaptive Network for Single Image Denoising. In Computer Vision—ECCV 2020; Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M., Eds.; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2020; Volume 12375, pp. 171–187. ISBN 978-3-030-58576-1. [Google Scholar]
Zhu, F.; Li, H.; Ouyang, W.; Yu, N.; Wang, X. Learning Spatial Regularization with Image-Level Supervisions for Multi-Label Image Classification. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2027–2036. [Google Scholar]
Anselin, L. Local Indicators of Spatial Association—LISA. Geogr. Anal. 1995, 27, 93–115. [Google Scholar] [CrossRef]
Getis, A.; Ord, J.K. The Analysis of Spatial Association by Use of Distance Statistics. Geogr. Anal. 1992, 24, 189–206. [Google Scholar] [CrossRef]
Lundberg, S.M.; Lee, S.-I. A Unified Approach to Interpreting Model Predictions. Adv. Neural Inf. Process. Syst. 2017, 30, 4768–4777. [Google Scholar]
Ullah, I.; Liu, K.; Yamamoto, T.; Zahid, M.; Jamal, A. Prediction of Electric Vehicle Charging Duration Time Using Ensemble Machine Learning Algorithm and Shapley Additive Explanations. Int. J. Energy Res. 2022, 46, 15211–15230. [Google Scholar] [CrossRef]
Tien Bui, D.; Shirzadi, A.; Chapi, K.; Shahabi, H.; Pradhan, B.; Pham, B.; Singh, V.; Chen, W.; Khosravi, K.; Bin Ahmad, B.; et al. A Hybrid Computational Intelligence Approach to Groundwater Spring Potential Mapping. Water 2019, 11, 2013. [Google Scholar] [CrossRef]
Ai, N.; Zheng, J.; Chen, X. Electric Vehicle Park-Charge-Ride Programs: A Planning Framework and Case Study in Chicago. Transp. Res. Part D Transp. Environ. 2018, 59, 433–450. [Google Scholar] [CrossRef]
He, S.Y.; Kuo, Y.-H.; Sun, K.K. The Spatial Planning of Public Electric Vehicle Charging Infrastructure in a High-Density City Using a Contextualised Location-Allocation Model. Transp. Res. Part A Policy Pract. 2022, 160, 21–44. [Google Scholar] [CrossRef]
Morrissey, P.; Weldon, P.; O’Mahony, M. Future Standard and Fast Charging Infrastructure Planning: An Analysis of Electric Vehicle Charging Behaviour. Energy Policy 2016, 89, 257–270. [Google Scholar] [CrossRef]
Xie, D.; Gou, Z. Dissipating Surplus Solar Photovoltaics Capacity from Net-Zero Energy Buildings to Electric Vehicle Charging Stations in Nearby Parking Lots: A Study in New York City. Energy Build. 2024, 303, 113818. [Google Scholar] [CrossRef]
Gönül, Ö.; Duman, A.C.; Güler, Ö. A Comprehensive Framework for Electric Vehicle Charging Station Siting along Highways Using Weighted Sum Method. Renew. Sustain. Energy Rev. 2024, 199, 114455. [Google Scholar] [CrossRef]
Tziokas, N.; Zhang, C.; Tziokas, A.; Wang, Q.; Atkinson, P.M. Downscaling Satellite Night-Time Light Imagery While Addressing the Blooming Effect. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 13678–13693. [Google Scholar] [CrossRef]

Figure 1. The SAML model.

Figure 2. A GIS-based grid system showing EVCSs, Wuhan, China.

Figure 3. Distribution of Points of Interest (POIs) within a representative grid cell in Wuhan, China.

Figure 4. Spatial distribution of key socio-economic and traffic characteristics across grid cells in Wuhan: (a) road density; (b) population density; (c) nighttime light density; (d) annual power use.

Figure 5. Prediction for the SAML model.

Figure 6. ΔDemand₀₂ spatial distribution in Wuhan.

Figure 7. ΔDemand₁₂ spatial distribution in Wuhan.

Figure 8. ΔDemand₀₁ spatial distribution in Wuhan.

Figure 9. ROC and AUC metrics in predicting different demand categories.

Figure 10. Results of Ablation Tests: (a) precision performance comparison; (b) recall performance comparison; (c) F1-score performance comparison; (d) overall accuracy comparison.

Figure 11. The mean SHAP values of the independent variables.

Figure 12. SHAP summary plot: (a) Low EVCS demand. (b) High EVCS demand.

Figure 13. Spatial distribution of key features (PS, PD, RD, and CR) in a grid cell with ΔDemand₀₂.

Table 1. Geodatabases and variables.

Category	Number	Geodatabase	Description	Number of Data	Data Source
Land Use	1	Institution (I)	Libraries, colleges, universities, etc.	8618	Baidu Map (https://map.baidu.com/ (accessed on 6 August 2024))
	2	Scenic Spot (SS)	Cultural relics, scenic spots, etc.	3699
	3	Medical Care (MC)	General hospitals, special hospitals, etc.	3873
	4	Leisure/Entertainment (LE)	Cinemas, sport facilities, KTV, etc.	15,278
	5	Government Agency (GA)	Provincial government, municipal government, etc.	11,360
	6	Food Service (FS)	Chinese restaurants, foreign restaurants, cafe or coffee shops, tea houses, etc.	40,610
	7	Company (C)	Office, personal business, etc.	55,101
	8	Commercial Residence (CR)	Office buildings, residential areas, hotels, etc.	15,899
	9	Shopping Center (SC)	Shopping malls, department stores, large market stores, etc.	38,711
Traffic	10	Parking Site (PS)	Public parking	9288
	11	Traffic Node (TN)	Gas stations, railway stations, bus stations, etc.	8658
	12	Road Density (RD)	Length of road segments per square kilometer	/	OpenStreetMap (https://www.openstreetmap.org/ (accessed on 6 August 2024))
Socio-economic	13	Population Density (PD)	Number of people per grid per square kilometer	/	Statistical Yearbook
	14	Nighttime Light Density (NLD)	Average value of nighttime light density	/	SNPP-VIIRS (https://www.earthdata.nasa.gov/ (accessed on 6 August 2024))
	15	Annual Power Use (APU)	Annual use of electricity	/	Chen et al. (https://doi.org/10.6084/m9.figshare.17004523.v1 (accessed on 6 August 2024))
EVCS	16	Electric Vehicle Charing Station (EVCS)	0 means that the grid cell has no EVCS, 1 means that the grid cell has 1–3 EVCS deployed, and 2 means that the grid cell has 4 or more EVCS deployed.	2557	Baidu Map (https://map.baidu.com/ (accessed on 6 August 2024))

Table 2. Training hyperparameters for the SAML model.

Hyperparameter	Description	Value
Optimizer	Optimization algorithm used	Adam
Learning Rate	Step size for the optimizer	0.001
Batch Size	Number of samples per gradient update	64
Epochs	Number of passes through the training dataset	50
λ (lambda)	Weight of the spatial loss term	0.12
σ (sigma)	Distance decay parameter in spatial weights	1 km

Table 3. Performance of the SAML model.

	Overall Accuracy	$P r e c i s i o n$	$R e c a l l$	F1-Score
High EVCS demand (Class 2)		0.60	0.62	0.61
Low EVCS demand (Class 1)	0.81	0.71	0.73	0.72
No EVCS Demand (Class 0)		0.88	0.91	0.90

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Published by MDPI on behalf of the World Electric Vehicle Association. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Huang, Y.; Ren, H.; Jia, X.; Yu, X.; Xie, D.; Zou, Y.; Chen, D.; Yang, Y. A Spatially Aware Machine Learning Method for Locating Electric Vehicle Charging Stations. World Electr. Veh. J. 2025, 16, 445. https://doi.org/10.3390/wevj16080445

AMA Style

Huang Y, Ren H, Jia X, Yu X, Xie D, Zou Y, Chen D, Yang Y. A Spatially Aware Machine Learning Method for Locating Electric Vehicle Charging Stations. World Electric Vehicle Journal. 2025; 16(8):445. https://doi.org/10.3390/wevj16080445

Chicago/Turabian Style

Huang, Yanyan, Hangyi Ren, Xudong Jia, Xianyu Yu, Dong Xie, You Zou, Daoyuan Chen, and Yi Yang. 2025. "A Spatially Aware Machine Learning Method for Locating Electric Vehicle Charging Stations" World Electric Vehicle Journal 16, no. 8: 445. https://doi.org/10.3390/wevj16080445

APA Style

Huang, Y., Ren, H., Jia, X., Yu, X., Xie, D., Zou, Y., Chen, D., & Yang, Y. (2025). A Spatially Aware Machine Learning Method for Locating Electric Vehicle Charging Stations. World Electric Vehicle Journal, 16(8), 445. https://doi.org/10.3390/wevj16080445

Article Menu

A Spatially Aware Machine Learning Method for Locating Electric Vehicle Charging Stations

Abstract

1. Introduction

2. Literature Review

2.1. EV Charging Demand Estimation

2.2. EVCS Location Optimization

3. Methodology

3.1. Research Area

3.2. A Grid-Based System

3.3. Data Collection and Pre-Processing

3.4. The SAML Model

3.4.1. Problem Definition

3.4.2. Partitioning of the Datasets

3.4.3. Normalization of Variables

3.4.4. Multi-Layer Perceptron (MLP)

3.4.5. The Combined Loss Function

3.4.6. Model Training and Implementation

3.5. Performance Evaluation

3.6. Shapley Additive Explanations (SHAP)

4. Results

4.1. Prediction Result

4.2. Model Experiment and Accuracy

4.3. Ablation Tests

4.4. Contributions of Urban Activities to EVCS Demand

5. Discussion

5.1. Sensitivity Analysis of Grid Resolution

5.2. Case Study for Model Validation

5.3. Future Policy Implications

5.4. Model Transferability

5.5. Limitations and Future Research

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI