Dynamic Graph Convolution-Based Spatio-Temporal Feature Network for Urban Water Demand Forecasting

Jia, Zhiwei; Li, Honghui; Yan, Jiahe; Sun, Jing; Han, Chengshan; Qu, Jingqi

doi:10.3390/app131810014

Open AccessArticle

Dynamic Graph Convolution-Based Spatio-Temporal Feature Network for Urban Water Demand Forecasting

by

Zhiwei Jia

¹,

Honghui Li

^1,2,*,

Jiahe Yan

¹,

Jing Sun

³,

Chengshan Han

¹ and

Jingqi Qu

¹

School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China

²

China Engineering Research Center of Network Management Technology for High Speed Railway of MOE, Beijing 100044, China

³

Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(18), 10014; https://doi.org/10.3390/app131810014

Submission received: 6 July 2023 / Revised: 29 August 2023 / Accepted: 31 August 2023 / Published: 5 September 2023

Download

Browse Figures

Versions Notes

Abstract

:

Urban water demand forecasting is the key component of smart water, which plays an important role in building a smart city. Although various methods have been proposed to improve forecast accuracy, most of these methods lack the ability to model spatio-temporal correlations. When dealing with the rich water demand monitoring data currently, it is difficult to achieve the desired prediction results. To address this issue from the perspective of improving the ability to extract temporal and spatial features, we propose a dynamic graph convolution-based spatio-temporal feature network (DG-STFN) model. Our model contains two major components, one is the dynamic graph generation module, which builds the dynamic graph structure based on the attention mechanism, and the other is the spatio-temporal feature block, which extracts the spatial and temporal features through graph convolution and conventional convolution. Based on the Shenzhen urban water supply dataset, five models SARIMAX, LSTM, STGCN, DCRNN, and ASTGCN are used to compare with DG-STFN proposed. The results show that DG-STFN outperforms the other models.

Keywords:

smart water; urban water demand forecasting; GCN; self-attention

1. Introduction

With the shortage of water resources and rapid urbanization, the realization of fine management with high efficiency of water supply services has become the primary problem to be solved in smart water. As the foundation of smart water [1], water demand forecasting plays an important role in water supply services. More specifically, accurate water demand forecasts can not only guide the water allocation but also provide data support for water leakage detection. However, water demand is usually characterized by complex nonlinearities, influenced by numerous factors. Therefore, how to make accurate water demand forecasting is a very challenging issue.

In the field of water demand forecasting, many methods have been proposed. These methods can be categorized into statistical and machine learning methods based on different principles. In early studies, statistical methods were widely used in water demand forecasting because of their simplicity and good interpretability. General models are autoregressive integrated moving average (ARIMA) [2] and vector autoregressive (VAR) [3]. However, changes in water demand exhibit intricate nonlinear patterns, which make it challenging for statistical methods based on linear assumptions to accurately predict water demand. To address this issue, researchers applied conventional machine learning methods to water demand forecasting. Two representative methods are random forest (RF) [4] and support vector machine (SVM) [5]. Although these methods can describe non-linear relationships, they typically rely on complex feature engineering. It is difficult to achieve accurate prediction when key features affecting water demand changes cannot be effectively extracted. The advantages of deep learning methods, such as automatic feature learning and representation, have made them increasingly the preferred choice for researchers in water demand forecasting. The utilization of recurrent neural networks (RNNs) and convolutional neural networks (CNNs) generally outperform traditional forecasting methods in predicting water demand [6]. However, these methods mainly focus on the analysis of temporal autocorrelation and cannot fully exploit the rich water demand monitoring data. In order to further improve the predictive accuracy of models, some researchers have begun to incorporate spatial correlation by using a combination of CNN and RNN to extract spatio-temporal features from data [7]. However, the limitation of the method is that it can only handle standard grid data.

In this paper, we propose a Dynamic Graph convolution based Spatio-Temporal Feature Network (DG-STFN) to jointly predict the water demand of multiple residential neighborhoods in a region. Our model does not have any specific limitations on data inputs and can efficiently capture the spatial and temporal features of water demand monitoring data. The main contributions of our work are as follows.

(1) We design a Dynamic Graph Generation module (DG) based on an attention mechanism, which explores the global dependencies and local dynamic changes in water demand among multiple residential neighborhoods in a fully data-driven manner.

(2) We build a Spatial-Temporal Feature Network (STFN) for water demand forecasting based on graph convolutional networks (GCN) and one-dimension convolutional neural networks (1D-CNN).

(3) We conduct comparative experiments on two real-world datasets and the results show that DG-STFN outperforms the baseline methods.

2. Related Work

2.1. Water Demand Forecasting

Water demand forecasting has been a popular research in recent decades. In earlier studies, statistics-based methods are widely used in water demand forecasting because of their good interpretability. For example, Oliveira et al. [2] used ARIMA based on the harmony search algorithm to forecast water demand for a district metering area. Guo Bingtuo [3] applied VAR to incorporate relevant factors affecting forecast results into the analysis to address short-term agricultural irrigation water use forecasting. While the results of these models are more interpretable, these models usually require the data to satisfy a certain mathematical distribution. In realistic forecasting scenarios, the data is often more complex, making the models less effective in predicting results.

Compared to statistics-based methods, machine learning methods have the ability to deal with complex nonlinear relationships hidden in the data. According to the application of artificial intelligence, it can be divided into traditional machine learning and deep learning. Overall, traditional machine learning methods have made good progress in water demand forecasting. Li et al. [4] used Random Forest Regression to forecast monthly water deficit. Candelieri et al. [5] designed parallel global optimization algorithms to optimize SVM parameters to solve complex water demand forecasting. However, such methods often require complex feature engineering and feature extraction, which is challenging in practice. In contrast, deep learning methods can learn features automatically in addition to handling complex non-linear relationships. As a result, more and more researchers are applying deep learning methods to water demand forecasting. Mu et al. [6] predicted short-term urban water demands based on LSTM models. Xu et al. [8] proposed a method to predict daily urban water demands by integrating multiple base models such as TCN and ARIMA to capture the complex non-linear correlation. Hu et al. [7] proposed a hybrid CNN-Bi-LSTM model for predicting urban water demand, in which CNN was used to extract features from the weather data and Bi-LSTM used historical data and extracted features to make predictions.

Most of the above methods are aimed at one-dimensional time series, so they only perform correlation analysis in the time dimension. When solving the current multi-dimensional time series, these methods are often unable to fully extract the data features due to model limitations, which makes the models have poor forecasting results.

2.2. Graph Convolution-Based Forecasting

Because of the outstanding performance of graph convolutional networks in dealing with correlation between sequences, some researchers apply graph convolutional networks to time series predictions. Such graph-convolution-based forecasting methods are widely used in the field of traffic flow forecasting. Yu et al. [9] used graph convolution to extract valuable patterns and features in the spatial domain for traffic forecasting. Li et al. [10] captured spatial dependence by bidirectional random wandering on graphs and temporal dependence by Seq2Seq. Cao et al. [11] solved the problem by transferring the time series into the spectral domain via the graphical Fourier Transformation. This type of graph-convolution-based forecasting method is largely fixed in structure. It usually consists of a graph convolutional network for extracting spatial correlation and a convolutional neural network [12,13,14] or recurrent neural network [15,16] for extracting temporal correlation. This idea for forecasting also provides advanced theoretical guidance for new types of urban water demand forecasting. Currently, Zanfei et al. [17] started to implement regional water demand forecasting based on graph structure data through a combination of graph convolution and recurrent neural networks. However, all the aforementioned methods rely on predefined static graphs and thus, lack the ability to model the dynamic correlation of time series. Recently, researchers have been working on dynamic graph generation for two research directions, feature embedding [18,19,20,21] and multi-stage graph training [22], respectively.

In this paper, we construct a dynamic graph generation module to capture the dynamic correlation of time series.

3. Materials and Methods

In this section, we first introduce the background knowledge related to our proposed method. Subsequently, we present the framework of the model, followed by a detailed description of each component of the model. Lastly, we elaborate on the preparatory work conducted prior to the experiments.

3.1. Preliminaries

Notations used in the paper are shown in Table 1. Functions are shown in Table 2.

Problem definition: Given the historical water demand monitoring data for N residential neighborhoods in a region, our task is to predict the future water demand for each residential neighborhood. Inspired by relevant work, we define the relationship between N residential neighborhoods belonging to a region as an undirected weighted graph

G = (V, E, A)

. Where

V

is a set of nodes, i.e.,

|V| = N

,

E

is a set of edges and

A \in R^{N \times N}

is the weighted adjacency matrix. We denote the historical water demand record on

G

at time

t

as a graph signal

X_{(t)} \in R^{N \times F}

. Our task is to predict

P

future graph signals using the

H

historical graph signals, which are expressed formally as shown in Equation (1).

[X_{(t - H + 1)}, X_{(t - H + 2)}, \dots, X_{(t)}] \overset{f (\cdot)}{\Rightarrow} [{\hat{X}}_{(t + 1)}, {\hat{X}}_{(t + 2)}, \dots, {\hat{X}}_{(t + P)}]

(1)

Graph convolution network: The correlation analysis in spatial dimension relies on graph convolution operations, which can be categorized into spatial-based and spectral-based methods [23]. Bruna et al. [24] introduced spectral graph theory on graph convolution to provide faster processing efficiency and wider versatility compared to spatial-based methods. Subsequently, the graph convolution network is represented as the product of the graph signal

X_{(t)}

and the graph convolution kernel

g_{θ}

, as shown in Equation (2).

g_{θ} * X_{(t)} = U g_{θ} (Λ) U^{T} X_{(t)}

(2)

where

U

is the matrix constructed from the eigenvectors of the Laplacian matrix corresponding to the graph

G

,

Λ

is the diagonal matrix formed by the eigenvalues of this Laplacian matrix, and

g_{θ} (Λ)

is the convolution kernel. To reduce the complexity of graph convolution, Chebyshev polynomials are used as convolution kernels [25], as shown in Equation (3).

g_{θ} (Λ) = \sum_{k = 0}^{K - 1} α_{k} T_{k} (\tilde{Λ})

(3)

where

T_{k} (\cdot)

is a Chebyshev polynomial,

α_{k}

is the coefficient of the polynomials of order, and

\tilde{Λ}

is the diagonal matrix of scaled eigenvalues.

Self-attention mechanism: The basic idea is to calculate the correlation between the positions in the sequence, and normalize the correlation into weights [26]. The correlation between positions within a sequence is usually represented by the result of a vector dot product. The computation process of attention weights can be viewed as a query-key-value mechanism, where the query is the current input vector, while the keys and values are the other input vectors. The output is as follows.

A t t e n t i o n (Q, K, V) = s o f t m a x (\frac{Q K^{T}}{\sqrt{d_{k}}}) V

(4)

3.2. The Framework of DG-STFN

Figure 1 shows the framework of our proposed DG-STFN, which consists of a dynamic graph generation module (DG) and a spatio-temporal feature network (STFN). The DG is used to explore the global dependencies and local dynamics of the correlations between the historical information of the nodes in the graph and generate the graph adjacency matrix and graph convolution kernel. The whole process is completely data-driven. The STFN mainly consists of two spatio-temporal feature blocks (STF Block) for extracting temporal and spatial features from the data. The detailed description of each component is shown in the following.

3.3. Dynamic Graph Generation

The GCN-based prediction models rely on the graph structure. However, the static graph structure predefined based on prior knowledge is currently difficult to capture the true connectivity between nodes, due to various factors such as weather and social activities. To address this issue, we design a dynamic graph generation module (DG), which contains three parts, namely Initial Dynamic Graph Generation (Initial DG), Spatio-Temporal Correlation Matrix Generation (ST Matrix), and Graph Convolution Kernel Generation (GC Kernel). These three parts are described in detail below.

3.3.1. Initial Dynamic Graph Generation

For some neighborhoods, water demand can show extremely high correlations due to factors such as population composition and size. However, this stable long-term correlation may experience temporary dynamic change due to factors such as working and visiting family. Therefore, we design the Initial DG to analyze the correlations between long-term relatively stable and local dynamic changes in time series.

Due to the relative stability of spatial correlations between nodes in the graph, we use predefined prior knowledge to represent this global stability. Prior knowledge can be correlation coefficients, distance relationships, etc. For water demand, its variation trend is mainly influenced by the composition and size of the population. When the population composition is similar, it typically exhibits similar patterns of change, while distance alone cannot reflect this. For example, two residential neighborhoods are close in proximity, but their daily water demand trends are completely different. In this case, using distance as a weighting factor would introduce erroneous information, whereas correlation coefficients can avoid such issues. Therefore, we use the correlation coefficient to represent the long-term relative stability and eliminate connections with low correlation by setting a threshold to obtain the long-term correlation matrix

A_{l o n g} \in R^{N \times N}

.

Of course, the long-term correlation matrix does not contain all the information. The main reason is that local correlations can be altered due to factors such as weather conditions, social activities, etc. Considering the influence will finally be reflected in time series, we use the data at each time step to analyze local spatial correlation. Specifically, the data within the time window is directly used as the embedding vector of its corresponding node, and then the weights are solved using the attention mechanism. The calculation is shown in Equation (5).

A_{b a t c h} = s o f t m a x (\frac{(χ^{T} \cdot W_{Q}) \cdot {(χ^{T} \cdot W_{K})}^{T}}{\sqrt{d_{k}}})

(5)

where

χ^{T} \in R^{B \times F \times N \times H}

is the transpose matrix of the

χ

,

W_{Q}

,

W_{K} \in R^{H \times K}

are the randomly initialized weight matrices, and

d_{k}

is the dimension of the node features.

Then, we perform feature dimensionality reduction on

A_{b a t c h} \in R^{B \times F \times N \times N}

using a fully connected layer (FC), and further, compress the dimensions using the squeeze function to obtain the local spatial correlation matrix

A_{s h o r t} \in R^{N \times N}

, denoted as follows:

A_{s h o r t} = s q u e e z e (F C (A_{b a t c h}^{T}))

(6)

Finally, we obtain the output

{D G}_{o r i}

of Initial DG by performing a weighted fusion of

A_{l o n g}

and

A_{s h o r t}

, which is generated dynamically based on data.

{D G}_{o r i} = t o p K (s o f t m a x (α A_{l o n g} ⨁ β A_{s h o r t}))

(7)

where α and β are both randomly initialized learnable parameters. The

t o p K (\cdot)

is used to retain the K nodes with the highest suitability for each node. This approach aims to enforce sparsity in the initial dynamic graph to prevent redundant information from interfering with the prediction task.

3.3.2. Spatio-Temporal Correlation Matrix Generation

In the previous section, we ignore the influence of long-term dependency in the temporal dimension on spatial correlations. However, these effects are common in real-life. For instance, the frequent usage of water over several time steps can directly affect the future demand for water. Therefore, we analyze the impact of information transmission in the time dimension on spatial correlation at each time step. Finally, we generate a spatio-temporal correlation matrix to quantify this impact.

Firstly, we use long short-term memory (LSTM) to handle the time series, with the aim of capturing long-term dependencies and conducting positional encoding. Then, we apply a self-attention mechanism to calculate the temporal attention weights for the output of the LSTM. Subsequently, the obtained attention weights are used to weight the LSTM output

χ_{E}

, resulting in the output

χ_{T e m}

. Finally, we transform

χ_{T e m}

into the embedding vectors of the corresponding nodes and then calculate spatial attention to obtain the spatio-temporal correlation matrix

W_{T S}

. The formula is as follows:

χ_{E} = L S T M ({s q u e e z e (p e r m u t e (χ) W_{s})}^{T}) \in R^{B \times H \times N_{h}}

(8)

χ_{E} = p e r m u t e (u n s q u e e z e (F C (χ_{E})) W_{u}) \in R^{B \times N \times H \times F_{1}}

(9)

χ_{T e m} = s o f t m a x (\frac{(χ_{E} W_{Q}) {(χ_{E} W_{K})}^{T}}{\sqrt{d_{k}}}) \cdot (χ_{E} W_{V}) \in R^{B \times N \times H \times F_{2}}

(10)

χ_{T e m} = p e r m u t e (F C (χ_{T e m})) \in R^{B \times N \times F_{1} \times H}

(11)

W_{T S} = s o f t m a x (σ ((χ_{T e m} W_{1}) W_{2} {(W_{3} χ_{T e m})}^{T} + b a i s)) \in R^{B \times N \times N}

(12)

where

W_{s} \in R^{F \times 1}

,

W_{u} \in R^{1 \times F_{1}}

,

W_{Q} \in R^{F_{1} \times F_{2}}

,

W_{K} \in R^{F_{1} \times F_{2}}

,

W_{V} \in R^{F_{1} \times F_{2}}

,

W_{1} \in R^{H}

,

W_{2} \in R^{F_{1} \times H}

and

W_{3} \in R^{F_{1}}

are all randomly initialized weight matrices.

3.3.3. Graph Convolution Kernel Generation

In this section, we integrate the output of the Initial DG and ST Matrix to generate graph convolutional kernels, which will be used in the Spatio-Temporal Feature Network (STFN). First, a fully connected layer is used to perform dimensionality reduction on the output

W_{T S}

of the ST Matrix. Subsequently, the resulting and the output

{D G}_{o r i}

of the Initial DG are fused by weight to obtain the final dynamic graph adjacency matrix

D G

. Then, we calculate the Laplace matrix

\tilde{L}

of

D G

. Finally, we compute Chebyshev polynomials based on

\tilde{L}

. The calculation process is as follows:

D G = {θ_{1} s q u e e z e (F C (W_{T S})) + θ_{2} D G}_{o r i}

(13)

\tilde{L} = D^{- 1 / 2} (D - D G) D^{1 / 2}

(14)

T_{0} (\tilde{L}) = I, T_{1} (\tilde{L}) = \tilde{L}, T_{k} (\tilde{L}) = 2 \tilde{L} T_{k - 1} (\tilde{L}) - T_{k - 2} (\tilde{L})

(15)

where

θ_{1}

and

θ_{2}

are randomly initialized learnable parameters,

D

is the degree matrix of

D G

, and

T_{k} (\cdot)

denotes a Chebyshev polynomial of order k.

3.4. Spatio-Temporal Feature Network

Considering the complexity of nonlinear water demand monitoring data, we design the Spatio-Temporal Feature Network (STFN) to achieve multi-step forecasting. The STFN can be seen on the left of Figure 1. It consists of two spatio-temporal feature blocks (STF blocks) and an output layer. The STF Block is used to extract spatio-temporal features in time series and the Output Layer is used to integrate the features captured by STF Block for multistep prediction.

3.4.1. Spatio-Temporal Feature Block

To solve the problem of water demand forecasting, the key step is to extract the temporal and spatial correlation in series data. However, due to the uncertainty in the correlation between nodes, it is difficult for CNN-based methods to handle such irregular connections. Therefore, we use a graph convolution network to extract spatial correlations. The underlying mechanism involves aggregating the features of neighboring nodes through Chebyshev polynomials to update the features of the node. To enhance the ability to capture nonlinear relationships, we apply the sigmoid function to process the output of GCN, and dynamically update the time series in the time window at each time step by using the output of the DG. The formulaic representation of this process is shown in Equation (16).

χ_{g c n} = σ (\sum_{k = 0}^{K - 1} α_{k} T_{k} (\tilde{L}) X_{(t)}) \in R^{B \times N \times F_{1} \times H}

(16)

where

α_{k} \in R^{F \times F_{1}}

is the learnable weight, and

K

is the order of the Chebyshev polynomial.

As for the temporal correlation analysis, the RNN-based method and CNN-based method are two widely used methods for analyzing the autocorrelation of time series. The RNN-based method can capture the time dependency but suffers from high computation and vanishing gradients. In contrast, CNN-based methods have the advantage of low computational effort and fast computation, and can effectively extract local and global features from time series. Here we choose the CNN-based method to extract temporal features from the

χ_{g c n}

. Specifically, we transform two-dimensional convolution into one-dimensional convolution by changing the size of the convolutional kernel, enabling us to extract temporal features. Equation (17) shows the temporal feature extraction process in the temporal dimension.

χ_{c n n} = C o n v 2 d (p e r m u t e (χ_{g c n})) \in R^{B \times F_{1} \times N \times H}

(17)

To address the issue of gradient vanishing in stacking multiple STF Blocks, a residual connection is introduced at the output of each STF Block. Additionally, to further enhance the non-linear modeling capability, we use the Rectified Linear Unit (

R e L U

) function to process the output of STF Block to

χ_{s t g c n}

.

3.4.2. Output Layer

Finally, we apply the convolutional neural network to obtain the prediction results for the next P time steps as shown below.

\hat{y} = C o n v 2 d (p e r m u t e (χ_{s t g c n})) \in R^{N \times F \times P}

(18)

3.5. Experimental Preparation

In this section, we primarily focus on the preparatory work before the experiment, including dataset introduction, data preprocessing, and evaluation metrics for experimental results.

3.5.1. Datasets Introduction

Two datasets derived from real-world application scenarios are used for the Experiments. Detailed descriptions of these datasets are provided below.

WSD (https://www.datafountain.cn/competitions/603/datasets (accessed on 7 September 2022)): This dataset is sourced from a water company in Shenzhen, China. The dataset comprises historical water consumption records from 20 residential neighborhoods in a region of Shenzhen, spanning from 1 January 2022 to 21 August 2022. There are three kinds of time resolution data in the dataset, which are sampled by smart meters in 5 min, 1 h, and 1 day. The hourly records are selected for our experiment.

APD (https://doi.org/10.7910/DVN/RGWV8X (accessed on 21 February 2023)): The dataset is sourced from Havard Dataverse and records the hourly average concentrations of air pollutants at 35 air monitoring stations in Beijing. The time span is from 1 January 2017, to 31 January 2018. In this experiment, the dataset is used to further validate the predictive performance of the proposed DG-STFN and explore its potential applications in other prediction problems in smart cities.

3.5.2. Data Preprocessing

Missing value imputation: For the WSD dataset, it is considered that the water consumption of each neighborhood at a certain time point may be highly similar to that of the same time point a few days ago. Therefore, we use the median water consumption for three days before and after the same time point to fill in the missing values. For the APD dataset, considering the high similarity between the missing values and the data of the two moments before and after, we use the average value of these two moments to fill in the missing values.

Outlier removal: Box plot or 3σ rule method is usually used to remove outliers, but 3σ rule requires data to meet normal distribution. For this reason, here we use the box plot method for both datasets.

To improve the model accuracy and accelerate the convergence of the model, we perform Z-Score normalization on both datasets.

3.5.3. Evaluation Metrics

To measure the performance of different models, the mean absolute error (MAE) and root mean square deviation (RMSE) are used as metrics. The equations for each indicator are as follows:

(1) Mean Absolute Error (MAE)

M A E = \frac{1}{n} \sum_{i = 1}^{n} |y_{i} - {\hat{y}}_{i}|

(19)

MAE is used to measure the magnitude of the error between the true value and the predicted value. The lower the result of its calculation, the more accurate the model is.

(2) Root Mean Square Deviation (RMSE)

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

(20)

RMSE is used to measure the average magnitude of the error between the true value and the predicted value and focuses on the deviation from the true value. The lower its value, the better the prediction of the model.

4. Experiments

To validate the performance of DG-STFN, several baselines are set, including traditional statistical models and cutting-edge deep learning models.

SARIMAX: A statistic-based method for time series, which introduces a seasonal term based on the ARIMA model polynomial.
LSTM: A special recurrent neural network (RNN) model for solving the long-term dependency.
STGCN [9]: A spatio-temporal graph convolutional network, which uses a convolutional structure to extract the spatio-temporal correlation of time series.
DCRNN [10]: A diffusion convolution recurrent neural network that uses diffusion convolution to capture spatial correlation combined with a Seq2Seq architecture to capture temporal correlation.
ASTGCN [12]: An attention-based spatio-temporal graph convolutional network with an attention mechanism for traffic flow prediction is designed to analyze the spatio-temporal features of time series.

In this paper, all experiments are executed on an NVIDIA GeoForce GTX 1050 GPU with 4GB memory. All of the models are training in Python 3.8.13 with Pytorch 1.12.1. Experiments are conducted with all models on the same dataset, which is split into a training, validation, and test set in an 8:1:1 ratio. As for the parameter settings of the models, the common parameter settings are shown in Table 3, and the other parameter settings are consistent with their original papers. For other parameter settings of the DG-STFN, we utilize a third-order Chebyshev polynomial as the graph convolution kernel, set the signal input and output channel numbers of the 1D-CNN to 64, and specify the convolution kernel size as (1, 3) with a stride of 1.

For the prediction task, the time series of 24 historical time steps are used to predict the data of the next 3, 6, 9, and 12 time steps, respectively. The results are obtained by averaging several rounds of training. The detailed results are shown in Table 4.

As shown in Table 4, SARIMAX performs the worst on both datasets. This model is suitable for dealing with stationarity and linear time series, but the time series in this experiment has complex nonlinear relationships, so the performance of SARIMA is the worst. In the deep learning model, the method based on GCN often performs better than LSTM, which shows to some extent that to solve the prediction problem, we should not only consider the time dependence in the time dimension but also give full consideration to the influence of spatial correlation between series. Generally speaking, the method based on GCN performs better because of its ability to extract spatial and temporal features. In GCN-based methods, both DCRNN and STGCN rely on a static predefined graph structure, and DCRNN performs better than STGCN on both datasets. This may be due to the fact that STGCN is a single-step prediction model, and the way it achieves multi-step prediction by overlaying single-step prediction results causes the problem of error accumulation, which in turn leads to poor performance. ASTGCN outperforms DCRNN and STGCN because it designs an attention mechanism that can correct the predefined graph structure. Compared with the baselines, DG-STFN fully analyses the correlation of the time series in the spatial dimension and the long-term dependence in the temporal dimension. The dynamic graph generation module is used to deal with the dynamics of long-term and local spatial correlations in time series, enabling the model to capture changes in local spatial correlations in the time series in a timely manner. The prediction accuracy is improved by enhancing the dynamic fit of DG-STFN to time series through the dynamic adjustment of the local connection relationship. Moreover, the results in Table 1 show that DG-STFN outperforms the baselines for all prediction tasks on both datasets. In particular, DG-STFN achieved an average improvement of approximately 33.82% compared to SARIMA, 26.20% compared to LSTM, 15.98% compared to STGCN, 13.97% compared to DCRNN, and 4.37% compared to ASTGCN in terms of MAE metric across four different prediction tasks with multiple time steps, on the WSD dataset.

To further compare the performance between DG-STFN and the GCN-based baselines, we visualize the predicted and actual results on the test set of the WSD dataset. Specifically, we focus on a one-week time span and intercept the results for comparative analysis, as shown in Figure 2.

As can be seen from Figure 2, DG-STFN, ASTGCN, and DCRNN predicted values are largely evenly dispersed around the ground truth, while the STGCN predicted values are relatively concentrated. Therefore, it is intuitive that STGCN has the poorest fit to the true values, and that DG-STFN, ASTGCN, and DCRNN have superior fit. The predicted values of the DG-STFN and ground truth are evenly distributed in an interval consisting of the minimum of the ground truth and the maximum of the predicted values. However, ASTGCN, STGCN, and DCRNN all have ground truth distributed outside the interval consisting of the minimum of the ground truth and the maximum of predicted values, indicating that these baselines are not as effective as DG-STFN in predicting data peaks. Further evidence of the validity of the DG-STFN model.

5. Conclusions

In this paper, we propose a dynamic graph convolution-based spatio-temporal feature network (DG-STFN) to solve the problem of water demand forecasting. Given that existing methods generally lack the ability to jointly extract spatio-temporal correlations, our model introduces graph convolution to achieve joint extraction of spatio-temporal features in time series. To ensure effective aggregation of spatial features through graph convolution, we design a dynamic graph generation module. This module captures the spatial correlation of global relative stability and local dynamic changes in time series through an attention mechanism. Finally, extensive experiments on two datasets demonstrate that DG-STFN outperformed the baselines. Of course, our work currently has some shortcomings. In time series forecasting problems, the sparsity of the graph has a significant impact on the forecasting accuracy. However, in our study, we simply use threshold filtering and sampling methods to ensure the sparsity of the generated dynamic graph. In future work, we aim to quantify the impact of graph sparsity, and then design the corresponding graph sparsity loss function to further improve the prediction accuracy of the model.

Author Contributions

Conceptualization, Z.J. and H.L.; methodology Z.J. and J.Y.; validation, Z.J., H.L. and J.Y.; formal analysis, Z.J. and J.Y.; investigation, J.S.; resources, H.L.; data curation, C.H.; writing—original draft preparation, Z.J.; writing—review and editing, Z.J.; visualization, J.Q.; supervision, H.L.; All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the China Guangdong Basic and Applied Basic Research Foundation, grant number 2021A1515011913.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

Menapace, A.; Zanfei, A.; Righetti, M. Tuning ANN Hyperparameters for Forecasting Drinking Water Demand. Appl. Sci. 2021, 11, 4290. [Google Scholar] [CrossRef]
Oliveira, P.J.; Steffen, J.L.; Cheung, P. Parameter Estimation of Seasonal ARIMA Models for Water Demand Forecasting Using the Harmony Search Algorithm. Procedia Eng. 2017, 186, 177–185. [Google Scholar] [CrossRef]
Guo, B.T. Research on Irrigation Water Forecasting in Irrigation Districts Based on VAR and VEC Models; Chinese Hydraulic Engineering Society: Yichang, China, 2019. [Google Scholar]
Li, Y.; Wei, K.K.; Chen, K.; He, J.Q.; Zhao, Y.; Yang, G.; Yao, N.; Niu, B.; Wang, B.; Wang, L.; et al. Forecasting monthly water deficit based on multi-variable linear regression and random forest models. Water 2023, 15, 1075. [Google Scholar] [CrossRef]
Candelieri, A.; Giordani, I.; Archetti, F.; Barkalov, K.; Meyerov, I.; Polovinkin, A.; Sysoyev, A.; Zolotykh, N. Tuning hyperparameters of a SVM-based water demand forecasting system through parallel global optimization. Comput. Oper. Res. 2019, 106, 202–209. [Google Scholar] [CrossRef]
Mu, L.; Zheng, F.F.; Tao, R.L.; Zhang, Q.Z.; Kapelan, Z. Hourly and daily urban water demand predictions using a long short-term memory based model. J. Water Resour. Plan. Manag. 2020, 146, 05020017. [Google Scholar] [CrossRef]
Hu, P.; Tong, J.; Wang, J.C.; Yang, Y.; Turci, L.D. A hybrid model based on CNN and Bi-LSTM for urban water demand prediction. In Proceedings of the 2019 IEEE Congress on Evolutionary Computation, Wellington, New Zealand, 10–13 June 2019. [Google Scholar]
Xu, Z.H.; Lv, Z.Q.; Li, J.B.; Shi, A.S. A novel approach for predicting water demand with complex patterns based on ensemble learning. Water Resour. Manag. 2022, 36, 4293–4312. [Google Scholar] [CrossRef]
Yu, B.; Yin, H.T.; Zhu, Z.X. Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting. In Proceedings of the 27th International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 13–19 July 2018. [Google Scholar]
Li, Y.G.; Yu, R.; Shahabi, C.; Liu, Y. Diffusion convolutional recurrent neural network: Data-driven traffic forecasting. arXiv 2018, arXiv:1707.01926v3. [Google Scholar]
Cao, D.F.; Wang, Y.J.; Duan, J.Y.; Zhang, C.; Zhu, X.; Huang, C.R.; Tong, Y.H.; Xu, B.X.; Bai, J.; Tong, J.; et al. Spectral temporal graph neural network for multivariate time-series forecasting. In Proceedings of the Advances in Neural Information Processing Systems 33, Vancouver, BC, Canada, 6–12 December 2020. [Google Scholar]
Guo, S.N.; Lin, Y.F.; Feng, N.; Song, C.; Wan, H.Y. Attention based spatio-temporal graph convolutional networks for traffic flow forecasting. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019. [Google Scholar]
Seo, Y.J.; Defferrard, M.; Vandergheynst, P.; Bresson, X. Structured sequence modeling with graph convolutional recurrent networks. In Proceedings of the 25th International Conference on Neural Information Processing, Siem Reap, Cambodia, 13–16 December 2018. [Google Scholar]
Wu, Z.H.; Pan, S.R.; Long, G.D.; Jiang, J.; Zhang, C.Q. Graph wavenet for deep spatio-temporal graph modeling. In Proceedings of the IJCAI, Macao, China, 10–16 August 2019. [Google Scholar]
Huang, Y.; Bi, H.; Li, Z.; Mao, T.; Wang, Z. STGAT: Modeling spatial-temporal interactions for human trajectory prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar]
Fu, Y.Y.; Zhang, F.; Du, Z.H.; Liu, R.Y. Multi-step pm2.5 hourly concentration prediction by fusing graph convolutional neural network and attention mechanism. J. Zhejiang Univ. 2021, 48, 74–83. [Google Scholar]
Zanfei, A.; Brentan, B.M.; Menapace, A.; Righetti, M.; Herrera, M. Graph convolutional recurrent neural networks for water demand forecasting. Water Resour. Res. 2022, 58, e2022WR032299. [Google Scholar] [CrossRef]
Ta, X.X.; Liu, Z.H.; Hu, X.; Yu, L.; Sun, L.L.; Du, B.W. Knowledge-Based Systems; Jones and Bartlett: Sudbury, MA, USA, 2022; Volume 242, p. 108199. [Google Scholar]
Bai, L.; Yao, L.N.; Li, C.; Wang, X.Z.; Wang, C. Adaptive graph convolutional recurrent network for traffic forecasting. In Proceedings of the 34th Conference on Neural Information Processing Systems, Electr Network, Virtual, 6–12 December 2020. [Google Scholar]
Ma, Q.W.; Sun, W.; Gao, J.B.; Ma, P.W.; Shi, M.J. Spatio-temporal adaptive graph convolutional networks for traffic flow forecasting. IET Intell. Transp. Syst. 2022, 17, 691–703. [Google Scholar] [CrossRef]
Sun, Y.F.; Jiang, X.H.; Hu, Y.L.; Duan, F.Q.; Guo, K.; Wang, B.Y.; Gao, J.B.; Yin, B.C. Dual dynamic spatial-temporal graph convolution network for traffic prediction. IEEE Trans. Intell. Transp. Syst. 2022, 23, 23680–23693. [Google Scholar] [CrossRef]
Zhang, W.; Zhu, F.H.; Lv, Y.S.; Tan, C.; Liu, W.; Zhang, X.; Wang, F.Y. AdapGL: An adaptive graph learning algorithm for traffic prediction based on spatiotemporal neural networks. Transp. Res. Part C Emerg. Technol. 2022, 139, 103659. [Google Scholar] [CrossRef]
Chen, Z.Q.; Chen, F.L.; Zhang, L.; Ji, T.R.; Fu, K.Q.; Zhao, L.; Chen, F.; Wu, L.F.; Aggarwal, C.R.; Lu, C.T. Bridging the gap between spatial and spectral domains: A survey on graph neural networks. arXiv 2020, arXiv:2002.11867. [Google Scholar]
Bruna, J.; Zaremba, W.; Szlam, A.; LeCun, Y. Spectral networks and locally connected networks on graphs. arXiv 2013, arXiv:1312.6203. [Google Scholar]
Defferrard, M.; Bresson, X.; Vandergheynst, P. Convolutional neural networks on graphs with fast localized spectral filtering. In Proceedings of the 30th Conference on Neural Information Processing Systems, Barcelona, Spain, 4–9 December 2016. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. In Proceedings of the Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]

Figure 1. Framework of the proposed DG-STFN.

Figure 2. Visualization of the prediction results for the same week using DG-STFN and the GCN-based baseline methods, on the test set of the WSD dataset. (a) Comparison of actual and predicted values of DG-STFN. (b) Comparison of actual and predicted values of ASTGCN. (c) Comparison of actual and predicted values of STGCN. (d) Comparison of actual and predicted values of DCRNN.

Table 1. Notations and descriptions.

Notation	Description
N	Number of nodes
F	Dimension of node attributes
$H, P$	Window size of historical and future
$χ \in R^{B \times N \times F \times H}$	Node attributes that record historical water demand data
$y, \hat{y} \in R^{N \times F \times P}$	Real and predicted water demand data
$G = (V, E, A)$	A graph defined by nodes, edges, and adjacency matrix

Table 2. Functions and descriptions.

Function	Description
$s o f t m a x (\cdot)$	The softmax activation function
$σ (\cdot)$	The sigmoid activation function
$p e r m u t e (\cdot)$	The dimension conversion function
$s q u e e z e (\cdot), u n s q u e e z e (\cdot)$	The dimension compression and expansion functions

Table 3. The common parameter settings of each model.

	WSD				APD
Model	Epoch	Batch Size	Optimizer	Learning Rate	Epoch	Batch Size	Optimizer	Learning Rate
LSTM	100	8	Adam	0.001	100	16	Adam	0.005
STGCN	100	8	Adam	0.005	100	16	Adam	0.005
DCRNN	100	8	Adam	0.005	100	16	Adam	0.001
ASTGCN	100	8	Adam	0.005	100	16	Adam	0.005
DG-STFN	100	8	Adam	0.002	100	16	Adam	0.001

Table 4. Performance comparison of different methods.

Dataset	Model	H = 24, P = 3		H = 24, P = 6		H = 24, P = 9		H = 24, P = 12
Dataset	Model	MAE	RMSE	MAE	RMSE	MAE	RMSE	MAE	RMSE
WSD	SARIMAX	3.22	8.65	3.53	9.08	4.16	9.27	4.23	8.33
	LSTM	2.99	7.23	3.32	7.90	3.59	8.68	3.39	8.23
	DCRNN	2.55	6.70	2.83	7.19	3.03	7.41	3.10	7.47
	STGCN	2.78	7.31	2.99	7.71	2.99	7.63	2.98	7.70
	ASTGCN	2.52	6.73	2.56	6.92	2.61	7.04	2.62	7.06
	DG-STFN	2.38	6.69	2.47	6.85	2.51	6.97	2.50	7.01
APD	SARIMAX	12.36	15.62	13.50	19.97	15.12	20.22	17.68	25.30
	LSTM	10.02	15.06	12.95	18.35	14.34	19.95	15.24	21.14
	DCRNN	9.63	14.51	12.20	17.84	13.75	19.64	15.17	21.10
	STGCN	10.60	15.12	12.59	17.86	14.59	20.44	17.14	23.16
	ASTGCN	9.60	14.49	12.22	17.82	13.72	19.51	14.89	20.78
	DG-STFN	9.42	14.28	12.05	17.59	13.61	19.33	14.82	20.68

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Jia, Z.; Li, H.; Yan, J.; Sun, J.; Han, C.; Qu, J. Dynamic Graph Convolution-Based Spatio-Temporal Feature Network for Urban Water Demand Forecasting. Appl. Sci. 2023, 13, 10014. https://doi.org/10.3390/app131810014

AMA Style

Jia Z, Li H, Yan J, Sun J, Han C, Qu J. Dynamic Graph Convolution-Based Spatio-Temporal Feature Network for Urban Water Demand Forecasting. Applied Sciences. 2023; 13(18):10014. https://doi.org/10.3390/app131810014

Chicago/Turabian Style

Jia, Zhiwei, Honghui Li, Jiahe Yan, Jing Sun, Chengshan Han, and Jingqi Qu. 2023. "Dynamic Graph Convolution-Based Spatio-Temporal Feature Network for Urban Water Demand Forecasting" Applied Sciences 13, no. 18: 10014. https://doi.org/10.3390/app131810014

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Dynamic Graph Convolution-Based Spatio-Temporal Feature Network for Urban Water Demand Forecasting

Abstract

1. Introduction

2. Related Work

2.1. Water Demand Forecasting

2.2. Graph Convolution-Based Forecasting

3. Materials and Methods

3.1. Preliminaries

3.2. The Framework of DG-STFN

3.3. Dynamic Graph Generation

3.3.1. Initial Dynamic Graph Generation

3.3.2. Spatio-Temporal Correlation Matrix Generation

3.3.3. Graph Convolution Kernel Generation

3.4. Spatio-Temporal Feature Network

3.4.1. Spatio-Temporal Feature Block

3.4.2. Output Layer

3.5. Experimental Preparation

3.5.1. Datasets Introduction

3.5.2. Data Preprocessing

3.5.3. Evaluation Metrics

4. Experiments

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI