*2.2. Initial Forecasting Model by LSSVM*

SVM has been widely applied in several areas including pattern recognition, regression, nonlinear classification, and function estimation. LSSVM is originated from SVM and first proposed by Suykens and Vandewalle [21], which is believed, takes a computational advantage over standard SVM by converting quadratic optimization problem into linear equations. In the field of water demand forecasting, the LSSVM is used to establish the nonlinear relationship between model inputs and outputs.

Consider a given training set of *N* samples (**X***i*; *yi*)(*i* = 1, ... , *N*), where **X***<sup>i</sup>* denotes the *i*th input vector in n-dimensional space (**X***<sup>i</sup>* <sup>=</sup> (*X*1*i*, ... , *Xni*)∈**R***n*) and *yi* is the corresponding desired output value (i.e., the observed value) of the *i*th sample. The nonlinear function between the inputs and outputs can be given as below [19,26,36]:

$$\boldsymbol{\mathfrak{g}}\_{i}(\boldsymbol{\chi}\_{i}) = \boldsymbol{\mathfrak{w}}^{T}\boldsymbol{\mathfrak{q}}(\boldsymbol{\chi}\_{i}) + \boldsymbol{b} \tag{3}$$

where *y*ˆ*<sup>i</sup>* is the model output corresponding to the sample *i*, the nonlinear transformation function ϕ(\*) maps the **X***<sup>i</sup>* to the *m*-dimensional feature space, ω is the *m*-dimensional weight parameter vector, and *b* is the bias parameter (ω∈**R***m*, *b*∈**R**).

Equation (3) provides the initial forecasting model of water demand, in other words, the relationship between the model input and output, where the input data is *X<sup>i</sup>* = (*Qt*, *Qt*–1, *Qt*–2, *Qt*–95, *Qt*–191, *Qt*–671) and the output *y*ˆ*<sup>i</sup>* is the forecasted water demand *Qt*+<sup>1</sup> at the target time *t* + 1. Detailed description of model input data selection is presented the Section 3.1.

Considering the complexity of minimizing the model errors between *yi* and *y*ˆ*i*, in the LSSVM, the parameters ω and *b* in equation (3) can be estimated according to the structural risk minimization principle [19,36]:

$$\min J(\boldsymbol{\omega}, \boldsymbol{\xi}) = \frac{1}{2} \boldsymbol{\omega}^T \boldsymbol{\omega} + \frac{1}{2} \boldsymbol{\gamma} \sum\_{i=1}^N \boldsymbol{\xi}\_i^2 \tag{4}$$

where γ is the regularization constant determining the tradeoff between the training error and the generalization performance, ξ*<sup>i</sup>* is a slack variable denotes model error.

The solution of the optimization problem (Equation (4)) can be obtained by Lagrange function [19,36]. Then the LSSVM model for the non-learner function in Equation (3) is finally turned into:

$$\mathfrak{H}(\mathbf{X}) = f(\mathbf{X}) = \sum\_{i=1}^{N} \alpha\_i \mathbf{K}(\mathbf{X}\_i, \mathbf{X}) + b \tag{5}$$

where α*<sup>i</sup>* (*i* = 1, ... , *N*) is the Lagrange multiplier and can be evaluated by γ, *K*(**X***i*, **X**) is the kernel function. The radial basis function (RBF) kernel is one of the most popular kernel functions, and is used in this study as below:

$$K(\mathbf{X}\_i, \mathbf{X}) = \exp\left(\frac{-||\mathbf{X}\_i - \mathbf{X}||^2}{2\sigma^2}\right) \tag{6}$$

where σ is the width parameter that reflects the radius enclosed by the boundary closure.

It is worth mentioning that, at this point, Equation (3) is transformed into Equation (5) which can be directly established though the training samples (**X***i*; *yi*) (*i* = 1, ... , *N*) and model parameters σ and γ. Therefore, establishing an LSSVM model with RBF kernel involves the selection of RBF kernel width σ and the regularization constant parameter γ. Among the available methods for parameter tuning of LSSVM such as the cross-validation method [19], the grid search method [26], and Bayesian framework-based inferring [13,37], the Bayesian approach with three levels of inference is chosen for parameter tuning of LSSVM in this study.
