Dynamic Demand Forecasting for Bike-Sharing E-Fences Using a Hybrid Deep Learning Framework with Spatio-Temporal Attention

Deng, Chen; Li, Yunxuan

doi:10.3390/su17177586

Open AccessArticle

Dynamic Demand Forecasting for Bike-Sharing E-Fences Using a Hybrid Deep Learning Framework with Spatio-Temporal Attention

by

Chen Deng

¹ and

Yunxuan Li

^2,*

¹

School of Arts, Qilu University of Technology (Shandong Academy of Sciences), Jinan 250353, China

²

Beijing Key Laboratory of Traffic Engineering, College of Metropolitan Transportation, Beijing University of Technology, Beijing 100124, China

^*

Author to whom correspondence should be addressed.

Sustainability 2025, 17(17), 7586; https://doi.org/10.3390/su17177586

Submission received: 25 June 2025 / Revised: 15 August 2025 / Accepted: 19 August 2025 / Published: 22 August 2025

(This article belongs to the Section Sustainable Transportation)

Download

Browse Figures

Versions Notes

Abstract

The rapid expansion of bike-sharing systems has introduced significant management challenges related to spatial-temporal demand fluctuations and inefficient e-fence capacity allocation. This study proposes a Spatio-Temporal Graph Attention Transformer Network (STGATN), a novel hybrid deep learning framework for dynamic demand forecasting in bike-sharing e-fence systems. The model integrates Graph Convolutional Networks to capture complex spatial dependencies among urban functional zones, Bi-LSTM networks to model temporal patterns with periodic variations, and attention mechanisms to dynamically incorporate weather impacts. By constructing a city-level graph based on POI-derived e-fences and implementing multi-source feature fusion through Transformer architecture, the STGATN effectively addresses the limitations of static capacity allocation strategies. The experimental results from Shenzhen’s Nanshan District demonstrate the performance, with the STGATN model achieving an overall Mean Absolute Error (MAE) of 0.0992 and a Coefficient of Determination (R²) of 0.8426. This significantly outperforms baseline models such as LSTM (R²: 0.6215) and a GCN (R²: 0.5488). Ablation studies confirm the model’s key components are critical; removing the GCN module decreased R² by 12 percentage points to 0.7411, while removing the weather attention mechanism reduced R² by nearly 5 percentage points to 0.8034. The framework provides a scientific basis for dynamic e-fence capacity management, advancing spatio-temporal prediction methodologies for sustainable transportation.

Keywords:

bike-sharing; e-fences; demand prediction; STGATN; sustainable transportation

1. Introduction

In recent years, explosive growth and widespread adoption of bike-sharing have been seen globally [1,2], particularly in China, where it is regarded as a convenient and environmentally friendly mode for short-distance travel [3]. The “last mile” problem in public transportation is effectively addressed [4], private car usage is reduced, and urban traffic congestion and environmental pollution are alleviated. These factors significantly contribute to the building of a green, low-carbon urban transportation system. According to statistics, the fleet size of shared bikes in China is reported to be over 12 million, with hundreds of millions of users being served and the travel habits of urban residents being profoundly changed [5]. In addition to the rapid expansion of bike-sharing systems in China, cycling has long been a key mode of daily transportation in many cities worldwide. Previous research has shown that urban cycling not only supports sustainable mobility but is also deeply embedded in commuting patterns for work, study, and leisure across diverse cultural contexts [6]. However, a series of new urban management challenges is also introduced by the rapid development of bike-sharing. The disorderly parking of bikes is presented as the most prominent issue [7]. Not only does this practice affect public space and the city’s appearance, but sidewalks and bike lanes can also be obstructed, which poses a threat to traffic order and pedestrian safety. Furthermore, other problems are led to by a mismatch between bike deployment and actual demand [8]. Bike accumulation and low utilization rates are suffered in some areas, while a shortage of available bikes is faced in others [9]. Low resource efficiency, increased operational costs, and a diminished user experience are the results of this imbalance.

To regulate parking behavior and improve management, e-fence technology is widely adopted by city administrators and operating companies. An e-fence is defined as a virtual parking area that is not physically marked [10]. By defining these virtual zones, user parking behavior can be effectively guided, indiscriminate parking can be reduced, and the efficiency of bike-sharing operations and maintenance can be enhanced [11]. Although the orderly management of shared bikes is improved to some extent by e-fences, a static capacity allocation strategy is still relied upon by most current systems. In dense urban areas like Beijing or New York, dynamic tidal flows are often not matched by static capacity allocations, which leads to overflow and underuse at different times of day. In contrast, low user compliance and poor spatial granularity in fence placement are struggles within cities with decentralized or sprawling layouts, such as Los Angeles or Berlin. This means that the maximum capacity of an e-fence is determined at the design stage based on experience or historical average data [12,13]. However, significant spatio-temporal dynamic variations are exhibited by the demand for bike rentals and returns, which are not accommodated by static capacity allocation. Accurate demand forecasting not only improves the capacity planning of e-fences but also plays a vital role in downstream operational tasks such as bike rebalancing, dispatch scheduling, and infrastructure optimization. As demonstrated by Zhang et al., demand-aware multi-objective dispatching strategies can significantly enhance system-level efficiency in shared bicycle operations [14].

Several key challenges are presented by forecasting demand for bike-sharing e-fences. A major limitation is found in the inadequate modeling of complex spatial correlations. The intricate interdependencies among different urban regions, particularly those involving diverse types of Points of Interest (POIs), are not captured by many existing approaches. An insufficient representation of temporal heterogeneity in demand patterns is presented as another challenge. Periodic variations and the dynamic, non-linear influence of external factors such as weather conditions are often not properly reflected by current models. Additionally, a significant hurdle is represented by the integration of spatio-temporal variables with the dynamic demand of bike-sharing. A unified framework capable of jointly modeling spatial relationships, temporal dynamics, and the adaptive integration of external information is lacking in most methods [15]. Limited predictive accuracy and robustness are led to by these shortcomings, and the effectiveness of dynamic capacity management for bike-sharing e-fences is hindered.

To address the limitations of traditional methods in modeling intricate spatio-temporal dependencies and external influences, a hybrid deep learning framework named the STGATN is developed in this study. A Graph Convolutional Network (GCN) is utilized to capture the spatial relationships among different urban regions, with particular attention given to areas surrounding e-fences and Points of Interest (POIs), so that spatial demand interdependencies can be learned more effectively by the model. For the modeling of temporal dynamics, including periodic fluctuations, long-term trends, and non-linear patterns, a Long Short-Term Memory (LSTM) network, which has been proven to be effective in time-series forecasting tasks, is incorporated into the framework. In addition, an attention mechanism is introduced so that the model’s sensitivity to external variables such as weather conditions is enhanced, which allows predictions to be adjusted in response to sudden or irregular changes in the environment. These components are integrated into a Transformer-based architecture, by which parallel computation is facilitated and global dependencies are captured. Through this unified design, complex spatial structures, temporal behaviors, and external disturbances are handled by the model in a cohesive and efficient manner. The main contributions of this study are as follows:

A novel, hybrid deep learning architecture is proposed, by which the spatial correlation and temporal heterogeneity features of bike-sharing data are effectively fused for dynamic demand forecasting.
An attention mechanism is innovatively applied for the integration of external factors like weather, so that predictions can be more intelligently adjusted by the model in response to dynamic environmental changes.
Technical support is provided for the dynamic capacity management of bike-sharing e-fences, with the potential for operational efficiency to be improved, resource allocation to be optimized, and urban traffic problems to be alleviated.

By the construction and validation of this hybrid model, more scientific and fine-grained decision support is intended to be provided for bike-sharing operations and urban transportation planning, thereby promoting the healthy and sustainable development of the bike-sharing industry.

The remainder of this study is organized as follows: In Section 2, existing studies are reviewed, and research gaps are highlighted. In Section 3, the methodology is established, including the model framework of the STGATN. In Section 4, a numerical experiment from Shenzhen’s Nanshan District and the results analysis are presented. Finally, in Section 5, conclusions for this study are drawn.

2. Literature Review

As an emerging mode of urban transport, the efficient operation and fine-grained management of bike-sharing have become important research topics in the field of smart cities [2]. E-fence technology was developed to address the challenges of disorderly parking and resource mismatch. The effectiveness of this technology, however, depends on the accurate prediction of bike demand.

In recent years, with the development of big data and artificial intelligence [16], relevant forecasting models have evolved from traditional statistical methods to complex deep learning architectures [15]. These new models place a greater emphasis on the detailed modeling of spatio-temporal dependencies, external factors, and multi-source data fusion. Early research often employed classic time-series models like ARIMA (Autoregressive Integrated Moving Average) [17]. While these methods are effective at capturing linear trends and periodic patterns, they struggle to handle the non-linear relationships and complex spatial influences inherent in bike-sharing data [18]. To overcome these limitations, researchers turned to deep learning, especially LSTM networks and their variants, such as the Gated Recurrent Unit (GRU) [19]. Such models can effectively learn the long-term temporal dependencies of bike-sharing demand, significantly improving prediction accuracy. However, urban traffic demand is inherently spatially correlated; demand in one area is inevitably affected by adjacent or functionally related areas.

To address the spatial modeling deficiencies of traditional time-series models, spatio-temporal fusion models became a primary research focus. Convolutional Neural Networks (CNNs) were used to extract spatial proximity features from grid-based city data [20]. The Convolutional LSTM architecture, which combines CNNs with LSTMs [21], can simultaneously capture local spatio-temporal correlations. However, simple grid divisions cannot represent the complex topological relationships between urban functional zones. Consequently, researchers introduced Graph Neural Networks (GNNs) [22]. By abstracting stations or regions as graph nodes and defining their connectivity, distance, or functional similarity as edges, GNNs can more effectively model dependencies in non-Euclidean space [23].

Among GNNs, the Graph Convolutional Network (GCN) was one of the first to prove its effectiveness. The GCN-GRU model constructed by Jiang et al. laid a foundation for subsequent research [23]. The Graph Attention Network (GAT) further improved upon this by assigning different attention weights to neighboring nodes. This allows for a dynamic perception of the strength of spatial correlations, giving GATs greater expressive power than GCNs [24]. To better handle the dynamic nature of spatio-temporal data, the Spatio-Temporal Graph Convolutional Network (STGCN) was proposed [25]. It integrates graph convolution and temporal convolution modules to learn from graph-structured spatio-temporal data in an end-to-end manner [26].

Meanwhile, the application of the attention mechanism expanded from the spatial dimension to temporal and multi-factor fusion. The work of Huang et al. demonstrated the role of attention in evaluating the importance of different historical time steps and adjacent stations [27]. More importantly, the attention mechanism provides a tool for integrating heterogeneous external factors such as weather, holidays, and large-scale events [28]. For example, Wu et al. designed an attention-based spatio-temporal network with an external factor fusion module [29]. Their work proved that dynamically weighting this information is crucial for improving model robustness during abnormal weather or special events.

Recently, the Transformer model has been introduced to the bike-sharing prediction field due to its global information-capturing ability and parallel processing advantages. Its core self-attention mechanism can directly calculate the dependency between any two positions in a sequence, effectively overcoming the vanishing gradient problem that affects RNNs when processing long sequences [30]. Xu et al. proposed a Spatio-Temporal Transformer model that effectively captures global spatio-temporal dependencies by designing separate spatial and temporal self-attention modules, achieving state-of-the-art performance on multiple datasets [31]. This indicates that Transformer-based architectures offer a new and more framework for uniformly modeling complex spatio-temporal relationships.

Actually, bike-sharing demand forecasting methods have evolved from focusing on a single temporal dimension to spatio-temporal fusion, and further to dynamic weighting and global information capture based on attention mechanisms. Complex combinations and innovations involving a GCN/GAT, LSTM/GRU [16], Attention, and Transformer are continuously advancing prediction accuracy and model interpretability.

Despite significant progress in bike-sharing demand forecasting and e-fence planning, a thorough review of the literature reveals several shortcomings in the current state of research.

Although existing research widely uses GNNs (e.g., GCNs and GATs) to capture spatial dependencies between stations or regions, this modeling approach has limitations. Most models construct graph networks primarily based on geographical proximity or historical traffic correlations. They fail to fully explore and differentiate the deep semantic associations between various urban functional zones (POIs). For example, a subway station’s influence on the demand patterns of surrounding office and residential areas is different, and this influence has directional and time-varying characteristics [29]. Existing models often treat these heterogeneous spatial relationships homogenously, which weakens the model’s ability to understand complex urban structures. As a result, they struggle to accurately capture the demand propagation and interaction effects driven by urban functions.

Bike-sharing demand exhibits high heterogeneity in the temporal dimension, including multiple periodicities (daily and weekly), trends, and random fluctuations. While recurrent neural networks like LSTM can effectively capture temporal dependencies, they often integrate dynamic external factors (e.g., weather, holidays, and major events) through simple feature concatenation [29]. The fundamental flaw of this static fusion strategy is that it assumes the influence of an external factor on demand is constant at all times. In reality, the inhibiting effect of a sudden rainstorm on demand during the morning rush hour is far greater than its effect at midnight. Current research generally lacks a dynamic, context-aware mechanism to weigh the actual influence of different external factors at specific moments. This leads to insufficient model robustness when dealing with sudden events and environmental changes, often resulting in distorted prediction results.

In summary, existing forecasting models have evolved from traditional statistical methods to deep learning approaches that incorporate spatio-temporal and external features. However, most models still lack a unified architecture capable of dynamically capturing complex spatial dependencies, heterogeneous temporal patterns, and the adaptive influence of external factors. By constructing an innovative hybrid architecture that integrates a GCN, LSTM, Attention, and Transformer, this paper aims to overcome the aforementioned shortcomings and provide more accurate and reliable decision support for the dynamic capacity management of bike-sharing e-fences.

3. Methodology

3.1. Model Framework

To achieve precise demand forecasting at the e-fence level for bike-sharing, this study constructs a novel, end-to-end deep learning framework named the STGATN [22]. This framework systematically analyzes the complex dynamic characteristics of bike-sharing demand through the synergy of three core modules. The overall architecture is illustrated in Figure 1.

Spatial Dependency Module: This module aims to capture the inherent geographical correlations within the urban spatial structure. First, using urban Point of Interest (POI) data, the DBSCAN density-based clustering algorithm identifies natural hot-spot areas for bike-sharing usage [13]. The centroids of these clusters are defined as the locations for the e-fences. Subsequently, all e-fences are abstracted as nodes in a graph. A city-level weighted undirected virtual graph is constructed, where edges do not represent physical infrastructure but rather reflect proximity-based spatial correlations and functional relationships derived from POI data. The edge weights are determined using a Gaussian kernel function applied to the Euclidean distances between e-fence centroids. Finally, a multi-layer Graph Convolutional Network (GCN) module, named SpatialGCN, aggregates neighborhood information and performs non-linear transformations to learn the high-dimensional spatial dependency features of each node (e-fence) embedded in the urban topology.

Temporal and Weather Module: In the temporal dimension, the model extracts temporal demand patterns and quantifies the dynamic impact of external factors in parallel. Specifically, a Bi-directional Long Short-Term Memory (Bi-LSTM) network module encodes the historical demand sequence and related time variables (e.g., hour of the day and day of the week) for each e-fence [30]. This effectively captures multi-scale temporal patterns in the demand, including periodicity, trends, and non-linear fluctuations. To dynamically integrate the non-linear effects of external factors like weather, this study designs a WeatherAttention module. This module uses the temporal features output by the Bi-LSTM as the Query, and the weather features from the same time period as the Key and Value. This process dynamically generates a context-aware meteorological impact vector.

Feature Fusion and Prediction Module: This module is responsible for the deep fusion of multi-source heterogeneous features and the generation of the final prediction. First, the model concatenates the spatial features from the GCN module, the temporal features from the Bi-LSTM module, and the weather context vector from the attention module. This high-dimensional combined vector is then passed through a Multi-Layer Perceptron (MLP) to achieve deep non-linear interaction and fusion. The fused feature sequence is finally fed into a Transformer Encoder. The encoder’s core self-attention mechanism captures the global dependencies among different feature dimensions, thereby refining the most critical comprehensive representation for the prediction task. Finally, a linear output layer decodes the Transformer’s output into precise predictions for pickup and dropoff counts for multiple future time steps.

3.2. Spatial Correlation and Graph Construction

Spatial dependency is central to urban traffic flow prediction. The demand at an e-fence is not only related to its own attributes but is also profoundly influenced by the functions and traffic conditions of surrounding areas. To ensure that the layout of e-fences aligns with the actual origins and destinations of urban residents, we generate virtual e-fences using a data-driven approach rather than relying on predefined administrative divisions.

All POI data points are collected within the study area using open platforms like the Amap API. Since bike-sharing parking behavior exhibits natural spatial clustering, the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) algorithm is used to aggregate the POI points. Compared to K-Means, DBSCAN does not require pre-specifying the number of clusters and can identify clusters of arbitrary shapes, making it highly suitable for identifying parking hot-spots [13]. The dense POI points are grouped into multiple clusters, and the centroid of each cluster is defined as the location of a virtual e-fence. Each e-fence is abstracted as a graph node v_i, and its initial feature vector x_i is composed of static attributes such as the functional mix entropy E_i:

E_{i} = - \sum_{k = 1}^{K} p_{i k} \log_{2} (p_{i k})

(1)

where p_ik is the proportion of the k-th category of POI within e-fence i. The connections between nodes are determined by geographical proximity, and their weights are calculated using a Gaussian kernel function:

ω_{i j} = \exp (- \frac{d i s t (v_{i}, v_{j})^{2}}{2 σ^{2}})

(2)

where

d i s t (v_{i}, v_{j})

is the Euclidean distance between e-fences i and j, and σ is a hyperparameter representing the bandwidth of the Gaussian kernel, which controls the decay rate of the weight as distance increases. Based on the constructed graph G = (V, E, W), a multi-layer SpatialGCN is used to learn spatial dependencies. The core of this module is the graph convolution operation, with the following propagation rule:

H^{(l + 1)} = σ ({\hat{D}}^{- \frac{1}{2}} \hat{A} {\hat{D}}^{- \frac{1}{2}} H^{(l)} W^{(l)})

(3)

where

H^{(l)}

is the node feature matrix at layer l,

W^{(l)}

is a learnable weight matrix,

\hat{A}

is the adjacency matrix with added self-loops, and

\hat{D}

is its degree matrix.

After propagation through multiple graph convolution layers and non-linear transformations, SpatialGCN outputs a high-dimensional feature vector for each e-fence node in the graph. This vector contains rich spatial context information. In the subsequent forward propagation, SpatialGCN retrieves the corresponding spatial features from this global tensor based on the e-fence ID of each sample in a batch.

3.3. Temporal and Weather Modeling

Bike-sharing demand is not only spatially correlated but also exhibits complex dynamic patterns over time and is significantly affected by external factors like weather. This section explains how a Bi-LSTM and an innovative weather attention module are used to capture these dynamics.

This study uses a TemporalLSTM module to process time-series data and designs a WeatherAttention module to dynamically fuse weather information. The core of the TemporalLSTM module is a Bi-LSTM network, which captures dual contextual information at each time step through forward and backward recurrent computations. For an input sequence X_t = x₁, …, x_T, the LSTM unit’s computation involves a forget gate f_t, an input gate i_t, an output gate o_t, and updates to the cell state c_t and hidden state h_t. Unlike standard unidirectional LSTM networks that process sequences in a single temporal direction, the Bi-LSTM architecture consists of two parallel LSTM layers: one processing the input in the forward direction (t = 1→T) and the other in the backward direction (t = T→1). The outputs from both directions are concatenated to form a comprehensive temporal representation that enhances the model’s ability to capture complex sequential dynamics in bike-sharing demand.

h_{t} = [\vec{h_{t}}; \overset{\leftarrow}{h_{t}}]

(4)

Weather is one of the most influential external factors affecting users’ intention to ride, yet its impact is inherently dynamic and non-linear [1]. Simply appending weather variables to the input features is insufficient, as it does not enable the model to discern when and to what extent weather conditions influence demand. To address this, the present study introduces a specialized module, WeatherAttention, designed to dynamically capture the relevance of weather information based on the current demand pattern. The core idea is inspired by the attention mechanism widely used in natural language processing, which enables models to focus selectively on the most relevant parts of the input. The WeatherAttention module adopts the standard Query–Key–Value framework. The Query (Q) is derived from the output of the TemporalLSTM, representing the model’s temporal understanding of demand at each time step. Conceptually, it reflects the question: “Given the current demand fluctuations, how much attention should be paid to weather conditions?” The Key (K) consists of input weather features, such as wind speed and precipitation, which are compared against the Query to compute relevance scores. The Value (V) carries the actual weather information to be integrated into the model’s prediction, weighted by the attention scores.

α = softmax (\frac{(Q W_{Q}) (K W_{K})^{T}}{\sqrt{d_{a t t n}}})

(5)

where softmax(⋅) is the activation function, W_Q and W_K are learnable weight matrices, and

\sqrt{d_{a t t n}}

is the scaling factor. The attention weight matrix is applied to the value vector V to generate a context-aware meteorological impact vector,

C_{w e a t h e r} = α (V W_{V})

. This vector represents the intensity of the weather’s impact, dynamically adjusted according to the current demand pattern.

3.4. Feature Fusion and Prediction

This module, as the final stage of the model, is responsible for the deep fusion and decoding of the multi-source heterogeneous features extracted by the preceding modules to generate the final prediction sequence. This process involves two core steps: first, a fusion layer integrates the spatial, temporal, and external factor features into a unified high-dimensional representation; second, a Transformer Encoder captures the global dependencies within this representation, and an output layer generates the predictions.

To effectively integrate multi-modal information, the spatial features

H_{s p a t i a l}

extracted by the GCN, the temporal context representation

h_{T}

from the final time step of the Bi-directional LSTM, and the weather impact vector cT generated by the WeatherAttention mechanism are concatenated to form an initial fused representation

H_{f u s e d} = C o n c a t (H_{s p a t i a l}, h_{T}, c_{T})

. This fused vector encapsulates spatial, temporal, and contextual information relevant to bike-sharing demand at the current time step. To model the complex, non-linear interactions among these heterogeneous features,

H_{f u s e d}

is passed through an MLP for deep feature fusion. The process can be formally expressed as

H_{int e r a c t i v e} = MLP (H_{f u s e d}) = σ (H_{f u s e d} W_{1} + b_{1}) W_{2} + b_{2}

(6)

where W₁, b₁, W₂, b₂ are learnable parameters and σ is the ReLU activation function.

To further capture global, long-range dependencies within the fused features, the interacted representation

H_{int e r a c t i v e}

is processed by a Transformer Encoder. Prior to encoding, positional encoding is applied to

H_{int e r a c t i v e}

to preserve the sequential ordering of the data, which is otherwise not explicitly modeled by the Transformer’s architecture. The sequence is then passed through a stack of Transformer Encoder layers. Each encoder layer consists of two key components: a Multi-Head Self-Attention mechanism and a Feed-Forward Network (FFN). The self-attention mechanism enables the model to attend to different positions in the sequence simultaneously, thereby capturing contextual relationships over long distances. To enhance training stability and model convergence, each sub-layer within the encoder is equipped with residual connections and followed by layer normalization. This architectural design allows the model to effectively learn hierarchical representations of complex spatio-temporal interactions.

X^{'} = LayerNorm (X_{e n c} + MultiHead (X_{e n c}))

(7)

H_{e n c} = LayerNorm (X^{'} + MultiHead (X^{'}))

(8)

The Multi-Head Self-Attention mechanism enables the model to attend to information from different representation subspaces in parallel, thereby enhancing its ability to capture complex dependencies.

Finally, the refined representation vector

H_{e n c}

, obtained from the Transformer Encoder, is passed through a linear projection layer to produce demand forecasts for the next n time steps. To ensure that the predicted values comply with real-world constraints, a Rectified Linear Unit (ReLU) activation function is applied to the output:

\hat{Y} = ReLU (Reshape (H_{e n c} W_{o u t} + b_{o u t}))

(9)

The Transformer Encoder module consists of 2 stacked encoder layers, each with 4 attention heads. The dimensionality of the fused input representation is 128, and the internal feed-forward layer in each encoder uses 256 hidden units. We apply sinusoidal positional encoding to the input sequence prior to encoding, and each encoder layer includes residual connections and layer normalization following the original Transformer implementation. A dropout rate of 0.1 is applied after the multi-head attention and feed-forward sublayers. These hyperparameters were selected based on validation performance and computational efficiency considerations.

By sequentially integrating spatial, temporal, and contextual information through specialized modules, this study develops a structurally comprehensive and logically coherent deep learning framework. The proposed model effectively captures these intricate demand patterns and external influences, offering a robust methodological foundation for high-precision, dynamic demand forecasting in bike-sharing systems.

4. Experimental Results and Analysis

4.1. Data Preprocessing

For this study, Nanshan Street in Nanshan District, Shenzhen, was selected as the experimental area (geographical coordinate range: 113.85° E–113.95° E, 22.48° N–22.55° N). This area, covering a total of 185.22 square kilometers, is characterized as a typical high-density urban built-up area. It is one of Shenzhen’s key commercial, residential, and technological innovation centers, featuring high population density, well-developed transportation facilities, and a rich diversity of POI types. The frequent use and complex demand patterns of bike-sharing in this area provide a representative scenario for this study. The data required for this study primarily include bike-sharing order data, POI data, and weather data, as summarized in Table 1.

The original bike-sharing order data were sourced from the Shenzhen Government Data Open Platform (https://opendata.sz.gov.cn/data/dataSet/toDataDetails/29200_00403627, accessed on 20 January 2025). This dataset covers bike-sharing order information from January to August 2021, including fields such as user ID, start time, start longitude, start latitude, end time, end longitude, end latitude, and company ID. The data have been appropriately anonymized to ensure user privacy. To focus the scope of the study and manage computational resources, all bike-sharing order data within the Nanshan Street area from 1 July 2021, 00:00:00 to 31 July 2021, 12:59:59 were extracted. After spatial and temporal filtering, a total of 1,784,936 valid order records were obtained. Within this period, 922,724 pickup orders and 980,405 dropoff orders were recorded (the difference is due to vehicle rebalancing or cross-regional trips). These order data contain precise spatio-temporal information on bike-sharing usage and form the basis for analyzing user travel patterns and forecasting regional demand.

To capture the potential influence of different functional zones on bike-sharing demand and their spatial correlations, POI data for the Nanshan Street area were collected (see Figure 2). The POI data were classified into five categories based on their functional type: traffic facilities (Figure 2a), residential (Figure 2b), public services (Figure 2c), commercial services (Figure 2d), and education and parks (Figure 2e). A total of 11,176 POIs are located within the Nanshan Street area. A high concentration of commercial (d) and residential (b) POIs is evident in the Qianhai and Houhai districts, highlighting the area’s role as a dense hub for economic and living activities. These zones are interconnected by a network of traffic facilities (a) situated along major transportation corridors, which facilitate “last mile” journeys. Public services (c) and education/park areas (e) are more dispersed but contribute to the diverse urban functional mix that drives varied demand patterns. Most importantly, Figure 2f displays the 1013 virtual e-fences generated for this study. These locations are not arbitrary; they represent the centroids of high-density POI clusters identified via the DBSCAN algorithm. This data-driven approach ensures that the e-fences, which serve as the foundational nodes in the spatial graph network, are strategically positioned in natural hot-spots of urban activity, thereby providing a robust basis for modeling bike-sharing demand.

Weather conditions are a key external factor influencing users’ decisions to use shared bikes [1]. For this study, weather data corresponding to the study period (July 2021) in Nanshan Street were collected. The weather data were updated at a frequency of 60 min, resulting in a total of 744 time intervals with meteorological records. To analyze the impact of precipitation and wind speed on bike-sharing demand, features were extracted and classified from the weather data. Precipitation is one of the most direct negative factors affecting riding comfort and willingness. According to the meteorological data, precipitation occurred in approximately 11.5% of the time intervals during the selected study period. This provides the model with representative negative samples for learning the inhibitory effect of rain on riding demand. Additionally, wind speed is another important factor affecting the riding experience. As wind speed increases, a downward trend in bike-sharing demand is observed, with the inhibitory effect becoming more pronounced after the wind speed exceeds a certain threshold. The weather data will be input into the prediction model as external influencing features to improve the model’s adaptability to environmental changes and its prediction accuracy.

To map discrete order data points to meaningful regional units, the DBSCAN algorithm was utilized for spatial analysis of the POI data in the study area. With parameters set to eps = 0.0003 and min_samples = 5, the DBSCAN algorithm identified high-density POI clusters, which were then used to generate a set of optimized e-fences as the basic spatial units for this study. Ultimately, 1013 e-fences were identified and generated from the original 11,176 POI points (see Figure 2f). These e-fences represent areas with potentially high demand for bike-sharing pickups or dropoffs.

To construct the spatio-temporal sequence dataset, the order data were aggregated using a fixed time interval of 10 min. For each e-fence (spatial unit) and each 10 min time interval, the number of bike-sharing pickup orders and dropoff orders occurring within that time and location was counted. This aggregation process transformed the original discrete order events into continuous regional time-series data, forming a spatio-temporal tensor containing pickup and dropoff demand features. The processed hourly weather data were then aligned with the 10 min aggregated order data. Since the granularity of the weather data is coarser, the hourly weather data were replicated and applied to all six 10 min intervals within that hour. These weather features were treated as exogenous variables affecting regional demand and were input into the model along with the regional order demand.

Based on the time-aggregated demand and external features, a spatio-temporal sequence dataset was constructed for model training using a sliding time window method. A large time-series dataset containing 4,535,201 samples was ultimately built. To evaluate the model’s generalization ability and prediction performance, this dataset was strictly divided chronologically into training, validation, and test sets, with a split ratio of 70%, 15%, and 15%, respectively.

The model training and inference for this study were conducted on a computing platform equipped with a 13th Gen Intel (R) Core (TM) i7-13700K CPU and an NVIDIA GeForce RTX 4080 GPU. The key model parameters were set as follows: a time interval of 10 min, an input sequence length of 6 time steps, and a prediction sequence length of 2 time steps. To comprehensively measure the model’s prediction performance, the following three widely recognized evaluation metrics were adopted:

Mean Absolute Error (MAE): Directly measures the average magnitude of the errors between predicted and true values. Its formula is

MAE = \frac{1}{n} \sum_{i = 1}^{n} | y_{i} - {\hat{y}}_{i} |

(10)

Root Mean Square Error (RMSE): An error metric that gives higher weight to larger errors. Its formula is

RMSE = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} (y_{i} - {\hat{y}}_{i})^{2}}

(11)

Coefficient of Determination (R-squared, R²): Measures the proportion of the variance in the dependent variable that is predictable from the independent variable. It ranges from 0 to 1, with values closer to 1 indicating a better model fit. Its formula is

R^{2} = 1 - \frac{\sum_{i = 1}^{n} (y_{i} - {\hat{y}}_{i})^{2}}{\sum_{i = 1}^{n} (y_{i} - \bar{y})^{2}}

(12)

where

y_{i}

is the true value,

{\hat{y}}_{i}

is the predicted value,

\bar{y}

is the mean of the true values, and n is the total number of samples.

4.2. Model Performance

To optimize model performance, a grid search over key hyperparameters was conducted using the validation set. The following parameters were tuned: learning rate ∈{1 × 10⁻⁴,5 × 10⁻⁴,1 × 10⁻³,5 × 10⁻³}, batch size ∈{32,64,128}, number of GCN layers ∈{1,2,3}, number of Transformer attention heads ∈{2,4,8}, dropout rate ∈{0.1,0.3,0.5}, and number of hidden units in the MLP layer ∈{64,128,256}. The best-performing configuration was determined based on validation loss with early stopping. Figure 3 shows the curves of the training loss and validation loss as a function of the training epochs.

As can be seen in Figure 3, in the initial phase of training, both the training loss and validation loss show a rapid downward trend. This indicates that the model is able to effectively learn from the training data and capture the fundamental patterns of bike-sharing demand. As training progresses, the model’s performance continues to improve. Around the 46th epoch, the validation loss reaches its minimum value (0.1244), while the training loss is 0.9293. According to the early stopping strategy, the model at this point exhibits the best generalization ability on the validation set, achieving a good balance between fitting ability and predictive power on unseen data. Therefore, the model parameters from this epoch were saved and used as the final model for evaluation.

After the 46th epoch, although the training loss continued to decrease slowly, the validation loss showed signs of fluctuation and even a slight rebound. This is typically an indication of overfitting, where the model begins to learn noise and specific details from the training data that are not present in the unseen validation data, leading to a decline in performance on the validation set. The application of the early stopping mechanism effectively prevented the model from overfitting and ensured it possessed good generalization performance.

To comprehensively assess the final prediction performance of the STGATN model, the best model saved by the early stopping mechanism was evaluated on an independent test set. Three common evaluation metrics for prediction tasks were used: MAE, Root RMSE, and R². The MAE and RMSE measure the average deviation between the predicted and true values, with lower values indicating higher prediction accuracy. R² measures the model’s ability to explain the variance in the data, with values closer to 1 (or 100%) indicating a better goodness of fit. The evaluation results are summarized in Table 2.

As shown by the overall performance metrics in Table 2, the STGATN model demonstrates excellent performance on the bike-sharing e-fence demand forecasting task. The model’s R² reached 0.8426, indicating that the proposed model can effectively explain over 84% of the variance in the test data. This high R² value strongly proves that the model can effectively capture the complex dynamics of bike-sharing demand and possesses a high fitting capability. The model’s MAE is only 0.0992. This means that on the test set, the average prediction error for each e-fence in a future time step is less than 0.1 shared bikes. Such a low MAE, especially in predicting vehicle counts, demonstrates the model’s extremely high prediction accuracy, which has significant practical value for guiding dynamic adjustments of e-fence capacity and fine-grained vehicle dispatching.

It is noteworthy that the model’s RMSE (1.1996) is much larger than its MAE (0.0992). The RMSE is more sensitive to larger prediction errors and can be significantly inflated by a small number of samples with large errors. This phenomenon suggests that although the model achieves high accuracy in the vast majority of prediction scenarios, it may produce relatively large prediction errors in a few specific situations. These situations may involve outliers or be strongly influenced by sudden factors (e.g., extreme weather events and unplanned large-scale activities causing sharp demand fluctuations). The contribution of these large error points to the RMSE is much greater than their contribution to the MAE, thus causing the RMSE value to be much higher. An in-depth analysis of these large-error samples can help to further improve the model’s robustness in extreme or abnormal conditions.

Figure 4a,b visually demonstrate the error distribution characteristics of the model for pickup and dropoff predictions. The blue scatter points in both plots are roughly distributed along the red dashed line. This indicates a strong positive correlation between the predicted and true values, showing that the model has good predictive ability for both pickup and dropoff demand. Specifically, the overall pattern of pickup demand is more stable than that of dropoff demand, but its volume is also susceptible to larger, instantaneous “pulse-like” peaks caused by sudden events or specific origin attributes (e.g., demand at a subway exit during the morning rush hour can far exceed normal levels). These sudden peaks may cause the model to produce larger deviations at those times. Dropoff behavior may be relatively more dispersed and influenced by destination attractiveness, resulting in smoother demand fluctuations and a lower frequency or magnitude of large errors compared to pickup behavior.

The STGATN model demonstrated good performance on both pickup and dropoff prediction tasks, proving its ability to simultaneously model complex spatio-temporal dependencies and external factors.

4.3. Comparative Experiments and Ablation Analysis

To further validate the performance advantages of the STGATN model and to verify the contribution of its core internal modules to prediction accuracy, a series of comparative experiments and ablation studies was conducted. The STGATN model was compared with several classic baseline models and its own variants on the independent test set. The evaluation metrics used were the MAE, RMSE, and R². The experimental results are summarized in Table 3.

From the overall performance comparison results shown in Table 3, the following key conclusions can be drawn: Compared to the traditional statistical time-series model ARIMA, its performance on the bike-sharing demand forecasting task is poor (R² of only 0.3542). In contrast, deep learning-based models like LSTM and GCNs show much higher prediction accuracy and fitting ability. This fully demonstrates the advantage of deep learning models in capturing complex non-linear spatio-temporal data like bike-sharing demand.

When relying solely on a time-series model like LSTM (considering only the temporal dimension) or a graph neural network like a GCN (considering only the spatial dimension), the model’s performance is limited. The R² for LSTM is 0.6215, and for a GCN, it is 0.5488. In stark contrast, the STGATN model proposed in this study achieves an overall R² of 0.8426 by effectively fusing spatio-temporal features. Compared to the time-only LSTM model, the R² of the STGATN is improved by approximately 35.6%. Compared to the space-only GCN model, the R² is improved by approximately 53.5%. This result strongly demonstrates that simultaneously modeling the interplay between space and time in bike-sharing demand is crucial for achieving high-precision forecasts.

To quantify the contribution of the SpatialGCN module and the weather attention module in the STGATN model, ablation variant models with these modules removed were constructed for comparison. When the GCN module was removed, the model’s R² dropped significantly from 0.8426 to 0.7411, the MAE increased from 0.0992 to 0.2145, and the RMSE increased from 1.1996 to 1.5881. The approximate 12% decrease in R² quantifies the significant contribution of the SpatialGCN module to the overall performance. This validates that effectively capturing the topological relationships and spatial interactions between e-fences via a GCN provides critical spatial context information and is a key factor in improving prediction accuracy. When the weather attention module was removed, the model’s R² decreased from 0.8426 to 0.8034, the MAE increased from 0.0992 to 0.1583, and the RMSE increased from 1.1996 to 1.3992. The approximate 5% decrease in the R² value demonstrates the model’s predictive capability of dynamically fusing external weather factors. This proves that intelligently integrating weather information into the prediction process through an attention mechanism enables the model to more accurately capture the dynamic changes in demand affected by weather, thereby improving its robustness and prediction accuracy.

Figure 5 provides a more detailed analysis of the performance of each model on different prediction tasks (pickup and dropoff). As shown in Figure 5a, the STGATN model achieves the lowest MAE. Figure 5b presents RMSE comparisons, while Figure 5c illustrates the R² values. The time-series model LSTM performs slightly better than the GCN model on the pickup task. This may be because pickup behavior often exhibits stronger periodic commute patterns, which LSTM is adept at capturing. Conversely, the GCN model has a slight advantage in predicting dropoffs. This confirms the earlier speculation that dropoff behavior might be more concentrated spatially due to destination attractiveness, making spatial dependencies play a more critical role in dropoff prediction. However, the performance of both of these basic models is far inferior to that of the STGATN, further highlighting the limitations of relying on single-dimension modeling.

Figure 5d shows the inference time efficiency of each model. Inference efficiency is also crucial for the practical deployment of a model, as it directly affects whether the system can perform near-real-time dynamic dispatching. As can be seen, there is a clear negative correlation between the computational efficiency of the models and their structural complexity. ARIMA, being the simplest model, has the fastest inference speed. As model complexity increases, such as with the introduction of LSTM, a GCN, and the complete Transformer framework, the inference time also increases accordingly. The STGATN, as the most structurally complex model, has the highest computational cost. Although the STGATN is the most computationally expensive, the accuracy improvement it brings is substantial. Compared to the second-best model, the STGATN without Attention, the STGATN requires about 17% more computation time but yields an approximate 5% improvement in R² performance, along with a lower MAE and RMSE.

Overall, on both the pickup and dropoff tasks, the STGATN model proposed in this study achieved the best prediction performance metrics (the lowest MAE and RMSE, the highest R²) among all comparative models and their variants. This fully demonstrates that the STGATN, through its complete and synergistic architectural design, effectively fuses spatial correlations, temporal heterogeneity, and external weather factors to achieve precise dynamic demand forecasting for bike-sharing e-fences.

4.4. Further Analysis of Influencing Factors

In addition to the overall performance of the model architecture, an in-depth analysis of the impact of specific external factors (like weather) and internal urban structures (like POI distribution) on bike-sharing demand forecasting is helpful for understanding the model’s working mechanism and sources of prediction errors.

In the ablation study in Section 4.3, removing the weather attention module (the STGATN without Attention) led to a comprehensive decline in overall model performance. The overall R² dropped from 0.8426 to 0.8034, quantifying the significant contribution of weather information to improving prediction accuracy. As shown in Figure 5c, after removing the weather attention, the R² for the pickup task dropped by about 6.2% (from 0.8479 to 0.7951), while the R² for the dropoff task decreased by only about 2.1% (from 0.8296 to 0.8117). This difference indicates that rainy weather primarily affects a user’s decision to “start a trip.” When faced with rain, many potential users will forgo using a shared bike and choose other modes of transport, leading to a sharp decline in pickup orders [31]. However, for users already in transit, most will still complete their trip and find the nearest location to return the bike, even if it starts raining midway. The WeatherAttention module is able to learn that wind speed contributes moderate attention, with larger weights occurring when wind exceeds 6 m/s. This aligns with behavioral studies showing a drop in cycling activity under adverse wind conditions.

Different types of POIs in a city attract different population activities, thereby generating different bike-sharing demand patterns. For example, traffic facilities like subway stations and bus stops are important transfer points, and their surrounding e-fences typically experience high demand for both pickups and dropoffs during commuting peak hours [10]. Residential areas mainly generate pickup demand during the morning peak and dropoff demand during the evening peak [3]. Areas with public services show strong demand on weekdays, while commercial service areas see prominent demand on weekends or at other specific times [29]. The model in this study, by constructing a graph network from POI data and using GCN to learn the representations of nodes (e-fences) in the graph (Figure 2f), is able to implicitly capture the functional attributes of different POI types and their impact on the demand of surrounding areas. For example, the learned spatial features of an e-fence node closely connected to a subway station will reflect demand patterns related to commuting.

Bike-sharing trips are an important way to connect different POIs. For instance, a user might ride from a residential area to a subway station (residential POI to transport hub POI), and then from another subway station to an office district (transport hub POI to office district POI). The distance, accessibility, and functional complementarity between POIs collectively determine the origin–destination (OD) flow between regions. The GCN module, by propagating information through the graph network, is able to capture this cross-regional demand propagation effect. For example, the model can learn that when a certain office district shows high pickup demand on a workday, functionally related residential areas or transport hubs may have already generated corresponding dropoff demand shortly beforehand. This modeling of spatial correlation enables the model to more accurately predict the demand linkage between regions.

A detailed temporal analysis, presented in Figure 6, provides a granular view of the STGATN model’s predictive dynamics and highlights the synergistic contributions of its core modules under varying real-world conditions. Figure 6a illustrates the model’s performance over a 16 h period, comparing predictions against actual pickup orders on both a sunny and a rainy day. On rainy days, when actual demand is significantly suppressed, the full STGATN model’s predictions (red line) adeptly track this decline, demonstrating its robustness to external disruptions. This adaptability is further emphasized by the magnified views in Figure 6b,c, which clearly show that the WeatherAttention module is critical; its removal (the STGATN without Attention, green line) leads to a substantial overestimation of demand during rainy periods, as the model fails to account for the weather’s inhibitory effect on ridership. This confirms the attention mechanism’s role in dynamically weighing the influence of external factors, which primarily affects a user’s initial decision to start a trip.

Conversely, on sunny days, the performance of the STGATN without a GCN model (orange line) underscores the importance of the spatial component. These variant struggles to capture the sharpness of the morning demand peak, revealing that the GCN module is essential for modeling the complex demand propagation effects that arise from interactions between different urban functional zones, such as residential areas and transport hubs. Collectively, the visualizations in Figure 6 demonstrate the clear advantage of the integrated STGATN framework: by fusing spatial context from the GCN, temporal patterns, and dynamic external factors via the attention mechanism, the model achieves superior accuracy and resilience, accurately forecasting demand fluctuations through different times of the day and under diverse weather scenarios.

5. Conclusions

This study addresses the critical challenge of dynamic demand forecasting for bike-sharing e-fences, which is essential for overcoming the limitations of static capacity allocation and improving the overall operational efficiency of bike-sharing systems. This study proposed a novel hybrid deep learning framework named the STGATN. The model effectively integrates a GCN to capture spatial dependencies based on urban POI distributions, a Bi-LSTM network to learn temporal demand patterns, a weather-focused attention mechanism for the dynamic inclusion of external factors, and a Transformer Encoder for robust feature fusion and prediction.

Comprehensive experiments on a real-world dataset from Shenzhen validated the model’s performance. The STGATN achieved an overall R² of 0.8426 and an MAE of 0.0992, outperforming traditional models like ARIMA (R²: 0.3542) and single-modality deep learning models like LSTM (R²: 0.6215). The ablation studies quantitatively confirmed the importance of the model’s architecture. The explicit modeling of spatial context via the GCN module was proven essential, as its removal caused the R² to drop from 0.8426 to 0.7411. Similarly, the dynamic integration of external factors was validated, as removing the weather attention mechanism resulted in a performance decrease to an R² of 0.8034. These results underscore the necessity of a holistic approach that fuses spatial, temporal, and external information for accurate demand prediction. While future work can explore additional external factors and enhance robustness to extreme outliers, the STGATN framework provides a powerful and validated tool for optimizing bike-sharing operations and supporting sustainable urban mobility.

One limitation of this study is the reliance on data from a single district and a one-month observation window. Although Nanshan District provides a rich and diverse urban environment, further evaluations across multiple cities, suburban zones, and seasonal periods are needed to assess the model’s transferability and robustness under varied spatio-temporal conditions. Furthermore, some potential areas for future research remain. These include exploring the integration of other relevant external factors (e.g., events, road closures, and public transport status) more explicitly and enhancing the model’s robustness to extreme outlier events. The practical implementation challenges and computational efficiency optimizations required for real-time operational deployment at scale should be investigated.

Author Contributions

Conceptualization, Y.L.; methodology, C.D. and Y.L.; validation, Y.L.; formal analysis, C.D. and Y.L.; writing—review and editing, C.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are openly available in Shenzhen Government Data Open Platform at https://opendata.sz.gov.cn/data/dataSet/toDataDetails/29200_00403627 (accessed on 20 January 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Eren, E.; Uz, V.E. A review on bike-sharing: The factors affecting bike-sharing demand. Sustain. Cities Soc. 2020, 54, 101882. [Google Scholar] [CrossRef]
Jiang, W. Bike sharing usage prediction with deep learning: A survey. Neural Comput. Appl. 2022, 34, 15369–15385. [Google Scholar] [CrossRef] [PubMed]
Deng, C.; Ma, H. A Sustainable Dynamic Capacity Estimation Method Based on Bike-Sharing E-Fences. Sustainability 2024, 16, 6210. [Google Scholar] [CrossRef]
Li, X.; Xu, Y.; Zhang, X.; Shi, W.; Yue, Y.; Li, Q. Improving short-term bike sharing demand forecast through an irregular convolutional neural network. Transp. Res. Part C Emerg. Technol. 2023, 147, 103984. [Google Scholar] [CrossRef]
Shi, Y.; Zhang, L.; Lu, S.; Liu, Q. Short-Term Demand Prediction of Shared Bikes Based on LSTM Network. Electronics 2023, 12, 1381. [Google Scholar] [CrossRef]
Macioszek, E.; Jurdana, I. Bicycle traffic in the cities. Zesz. Nauk. Transp. Politech. Śląska 2022, 117, 115–127. [Google Scholar] [CrossRef]
Ferrari, G.; Tan, Y.; Diana, P.; Palazzo, M. The platformisation of cycling—The development of bicycle-sharing systems in China: Innovation, Urban and Social regeneration and sustainability. Sustainability 2024, 16, 5011. [Google Scholar] [CrossRef]
Ma, C.; Liu, T. Demand forecasting of shared bicycles based on combined deep learning models. Phys. A Stat. Mech. Its Appl. 2024, 635, 129492. [Google Scholar] [CrossRef]
Ma, X.; Yin, Y.; Jin, Y.; He, M.; Zhu, M. Short-term prediction of bike-sharing demand using multi-source data: A spatial-temporal graph attentional LSTM approach. Appl. Sci. 2022, 12, 1161. [Google Scholar] [CrossRef]
Wang, Z.; Yu, D.; Zheng, X.; Meng, F.; Wu, X. A Model-Data Dual-Driven Approach for Predicting Shared Bike Flow near Metro Stations. Sustainability 2025, 17, 1032. [Google Scholar] [CrossRef]
Cai, Y.; Ong, G.P.; Meng, Q. Bicycle sharing station planning: From free-floating to geo-fencing. Transp. Res. Part C Emerg. Technol. 2023, 147, 103990. [Google Scholar] [CrossRef]
Mangold, M.; Zhao, P.; Haitao, H.; Mansourian, A. Geo-fence planning for dockless bike-sharing systems: A GIS-based multi-criteria decision analysis framework. Urban Inform. 2022, 1, 17. [Google Scholar] [CrossRef]
Wei, Z.; Ma, H.; Li, Y.; Tang, J. A Multiscale Approach for Free-Float Bike-Sharing Electronic Fence Location Planning: A Case Study of Shenzhen City. J. Adv. Transp. 2024, 2024, 1783038. [Google Scholar] [CrossRef]
Zhang, W.; Zhang, S. Research on multi-objective optimisation for shared bicycle dispatching. Int. J. Veh. Inf. Commun. Syst. 2024, 9, 372–392. [Google Scholar] [CrossRef]
Wirtgen, C.; Kowald, M.; Luderschmidt, J.; Hünemohr, H. Multivariate demand forecasting for rental bike systems based on an unobserved component model. Electronics 2022, 11, 4146. [Google Scholar] [CrossRef]
Subramanian, M.; Cho, J.; Easwaramoorthy, S.V.; Murugesan, A.; Chinnasamy, R. Enhancing sustainable transportation: AI-driven bike demand forecasting in smart cities. Sustainability 2023, 15, 13840. [Google Scholar] [CrossRef]
Jaber, A.; Csonka, B.; Juhasz, J. Long term time series prediction of bike sharing trips: A cast study of Budapest city. In Proceedings of the 2022 Smart City Symposium Prague (SCSP), Prague, Czech Republic, 26–27 May 2022. [Google Scholar]
Liu, X.; Gherbi, A.; Li, W.; Cheriet, M. Multi features and multi-time steps LSTM based methodology for bike sharing availability prediction. Procedia Comput. Sci. 2019, 155, 394–401. [Google Scholar] [CrossRef]
Qiao, S.; Han, N.; Huang, J.; Yue, K.; Mao, R.; Shu, H.; He, Q.; Wu, X. A dynamic convolutional neural network based shared-bike demand forecasting model. ACM Trans. Intell. Syst. Technol. 2021, 12, 70. [Google Scholar] [CrossRef]
Jiang, M.; Chen, W.; Li, X. S-GCN-GRU-NN: A novel hybrid model by combining a Spatiotemporal Graph Convolutional Network and a Gated Recurrent Units Neural Network for short-term traffic speed forecasting. J. Data Inf. Manag. 2021, 3, 1–20. [Google Scholar] [CrossRef]
Belkessa, L.; Ameli, M.; Ramezani, M.; Zargayouna, M. Multi-Channel Spatio-Temporal Graph Convolutional Networks for Accurate Micromobility Demand Prediction Integrating Public Transport Data. In Proceedings of the 32nd ACM International Conference on Advances in Geographic Information Systems, Atlanta, GA, USA, 29 October–1 November 2024. [Google Scholar]
Yu, B.; Yin, H.; Zhu, Z. Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting. arXiv 2018, arXiv:1709.04875. [Google Scholar] [CrossRef]
Ali, A.; Zhu, Y.; Zakarya, M. Exploiting dynamic spatio-temporal graph convolutional neural networks for citywide traffic flows prediction. Neural Netw. 2022, 145, 233–247. [Google Scholar] [CrossRef]
Huang, L.; Ma, Y.; Wang, S.; Liu, Y. An attention-based spatiotemporal LSTM network for next POI recommendation. IEEE Trans. Serv. Comput. 2019, 14, 1585–1597. [Google Scholar] [CrossRef]
Hong, Y.; Zhu, H.; Shou, T.; Wang, Z.; Chen, L.; Wang, L.; Wang, C.; Chen, L. STORM: A Spatio-temporal context-aware model for predicting event-triggered abnormal crowd traffic. IEEE Trans. Intell. Transp. Syst. 2024, 25, 13051–13066. [Google Scholar] [CrossRef]
Wu, H.; Liang, Y.; Zuo, J. Human-inspired spatiotemporal feature extraction and fusion network for weather forecasting. Expert Syst. Appl. 2022, 207, 118089. [Google Scholar] [CrossRef]
Katharopoulos, A.; Vyas, A.; Pappas, N.; Fleuret, F. Transformers are RNNs: Fast autoregressive transformers with linear attention. In Proceedings of the 37th International Conference on Machine Learning, Online, 13–18 July 2020. [Google Scholar]
Xu, Y.; Zhao, X.; Zhang, X.; Paliwal, M. Real-time forecasting of dockless scooter-sharing demand: A spatio-temporal multi-graph transformer approach. IEEE Trans. Intell. Transp. Syst. 2023, 24, 8507–8518. [Google Scholar] [CrossRef]
Sathishkumar, V.E.; Park, J.; Cho, Y. Using data mining techniques for bike sharing demand prediction in metropolitan city. Comput. Commun. 2020, 153, 353–366. [Google Scholar] [CrossRef]
Lee, S.-H.; Ku, H.-C. A dual attention-based recurrent neural network for short-term bike sharing usage demand prediction. IEEE Trans. Intell. Transp. Syst. 2022, 24, 4621–4630. [Google Scholar] [CrossRef]
Campbell, A.A.; Cherry, C.R.; Ryerson, M.S.; Yang, X. Factors influencing the choice of shared bicycles and shared electric bikes in Beijing. Transp. Res. Part C Emerg. Technol. 2016, 67, 399–414. [Google Scholar] [CrossRef]

Figure 1. The STGATN framework.

Figure 2. Spatial distribution of Points of Interest (POIs) in Nanshan Street, Shenzhen, and resulting e-fence clusters based on DBSCAN.

Figure 3. Model training results.

Figure 4. Error distribution characteristics of prediction.

Figure 5. Performance of each model on pickup and dropoff.

Figure 6. Influencing factors of STGATN model prediction results.

Table 1. Study data description.

Order Data (Items)		POI Data (Points)
Research time	1 July 2021 00:00:00	Traffic Facilities	1452
Research time	31 July 2021 12:59:59	Residential	1201
Research area (km²)	185.22	Public Services	475
Pickup order	922,724	Commercial Services	7441
Dropoff order	980,405	Education and Parks	607
Total order	1,784,936	Total POI	11,176
Weather (60 min)	Total time period	Raining period	No rainy period
Weather (60 min)	744	86	658
Wind speed (m/s)	Minimum	Maximum	Average
Wind speed (m/s)	0	10	4.15

Table 2. Model performance results.

Task	MAE (↓)	RMSE (↓)	R² (↑)
Pickup	0.0920	1.4027	0.8479
Dropoff	0.1064	0.9543	0.8296
Overall	0.0992	1.1996	0.8426

Note. ↑ means the larger, results the better; ↓ means the smaller, results the better.

Table 3. Experimental results of model.

Model	MAE (↓)	RMSE (↓)	R² (↑)
ARIMA	1.1523	2.6581	0.3542
Random Forest	0.4876	1.8041	0.5923
XGBoost	0.4219	1.7225	0.6137
GCN	0.7891	2.1503	0.5488
LSTM	0.4312	1.9334	0.6215
Spatio-Temporal Transformer	0.1295	1.3102	0.8123
STGATN without GCN	0.2145	1.5881	0.7411
STGATN without Attention	0.1583	1.3992	0.8034
STGATN	0.0992	1.1996	0.8426

Note. ↑ means the larger, results the better; ↓ means the smaller, results the better.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Deng, C.; Li, Y. Dynamic Demand Forecasting for Bike-Sharing E-Fences Using a Hybrid Deep Learning Framework with Spatio-Temporal Attention. Sustainability 2025, 17, 7586. https://doi.org/10.3390/su17177586

AMA Style

Deng C, Li Y. Dynamic Demand Forecasting for Bike-Sharing E-Fences Using a Hybrid Deep Learning Framework with Spatio-Temporal Attention. Sustainability. 2025; 17(17):7586. https://doi.org/10.3390/su17177586

Chicago/Turabian Style

Deng, Chen, and Yunxuan Li. 2025. "Dynamic Demand Forecasting for Bike-Sharing E-Fences Using a Hybrid Deep Learning Framework with Spatio-Temporal Attention" Sustainability 17, no. 17: 7586. https://doi.org/10.3390/su17177586

APA Style

Deng, C., & Li, Y. (2025). Dynamic Demand Forecasting for Bike-Sharing E-Fences Using a Hybrid Deep Learning Framework with Spatio-Temporal Attention. Sustainability, 17(17), 7586. https://doi.org/10.3390/su17177586

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Dynamic Demand Forecasting for Bike-Sharing E-Fences Using a Hybrid Deep Learning Framework with Spatio-Temporal Attention

Abstract

1. Introduction

2. Literature Review

3. Methodology

3.1. Model Framework

3.2. Spatial Correlation and Graph Construction

3.3. Temporal and Weather Modeling

3.4. Feature Fusion and Prediction

4. Experimental Results and Analysis

4.1. Data Preprocessing

4.2. Model Performance

4.3. Comparative Experiments and Ablation Analysis

4.4. Further Analysis of Influencing Factors

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI