**3. Mathematical Model**

City-scale analyses are characterized by huge complexity, thus present UBEM tools usually require significant computational resources. Then, various assumptions are necessary to simplify the examined issue at an urban scale. Therefore, selecting the appropriate methods is required for valid calculations. In this section, the TEAC software is described, especially its' mathematical model and applied methodologies. Out of all modules of the TEAC software, the ANN application is the most important one—it is detailly presented in this section.

Whenever research is focused on the energy consumption at a building-level or whole city-scale, numerous variables are involved. Those variables usually interact with each other in a not fully understood way, as well as some of them (e.g., outdoor climate conditions) are highly unpredictable. Those types of problems are most appropriate for Artificial Intelligence (AI) applications, which are based on some input-output parameters and functional relationships between them. In general, Artificial Neural Networks (ANNs) can be classified into two main groups: Feed Forward Neural Network (FFNN) and Feed Backward Neural Network (FBNN); the comprehensive classification of the ANNs can be found in [50]. The ANNs have proven to be universal approximators in various fields of application—state-of-the-art overviews can be found in [50–52]. The ANNs are successfully applied for energy loads forecast at the building-scale [53,54], as well as urban-scale [34,55].

The structure of the defined ANN was investigated, in order to provide the best data regression with a reasonably short calculation time for the analyzed issue. Following the procedure published in [56], the different number of neurons within a single hidden layer was examined; the analysis started with 2 and ended with 24 neurons. The final structure of the applied network includes 14 input neurons, 12 neurons within a single hidden layer and one output neuron (see Figure 3). The output expresses the heating demand, while the inputs parameters define: the analysis timestep period (assigned as TP), outdoor temperature (DBT), total solar radiation (ITH), building heating area (A0), building volume (V0), total windows area (Awin), air-change rate (ntot), U-values of exterior walls (Uwall), roofs (Uroof), ground floors (Ufloor) and windows (Uwin), heating system efficiency (HCOP), as well as building orientation (OV) and closest surrounding (SV) variant.

**Figure 3.** The structure of the defined ANN.

The ANN module in the TEAC software is based on the network trained using the Levenberg-Marquardt method [57–59]. The L-M method was developed in the early 1960s' for solving nonlinear problems. During the definition process, also Bayesian Regularization [60] and Conjugate Gradient [61] methods were examined, but the L-M network was characterized by the best accuracy of predictions. The L-M method is based on a gradient vector and a Jacobian matrix—it might be considered as a combination of two minimization approaches, accordingly the Gauss-Newton [62] and the gradient descent [63] methods. The L-M method works more like a gradient-descent method when the parameters are far from their optimal value, while when the parameters are close to their optimal value it acts more like the Gauss-Newton method. Due to the fact, that the L-M method is a hybrid approach, it can be used to trade off the best features of different algorithms to solve a variety of problems. The L-M algorithm is particularly effective in solving non-linear equations; thus, it was effective for heating demand predictions of an urban area. Further below, for the convenience of the reader, the L-M method is briefly explained.

If the fitting model is a function *y*ˆ*(ti; p)* of an independent variables *ti*, and a vector of parameters *p* of data points *(ti; yi)*, minimize the sum of the weighted squares of the errors, as follows:

$$X^2(p) = \sum\_{i=1}^{m} \left[ \frac{y(t\_i - \hat{y}(t\_i; p))}{\sigma\_{y,i}} \right] \tag{1}$$

where *σy,i* is the measured error for datum *y(ti)*. Equation (1) can be rewritten using the weighting matrix *W*, as follows:

$$X^2(p) = (y - \hat{y}(p))^T \mathcal{W}(y - \hat{y}(p))\tag{2}$$

$$X^2(p) = y^T \mathcal{W} y - 2y^T \mathcal{W} \hat{y} + \hat{y}^T \mathcal{W} \hat{y} \tag{3}$$

If the function *y*ˆ*(ti; p)* is nonlinear in the model of parameters *p*, then the minimization of the *X2(p)* is carried out iteratively.

Using the Gradient Decent Method for a minimalization task, the objective function can be expressed with the following equation:

$$\frac{\partial}{\partial p}X^2 = 2(y - \hat{y}(p))^T \mathcal{W} \frac{\partial}{\partial p}(y - \hat{y}(p))\tag{4}$$

$$\frac{\partial}{\partial p}X^2 = -2(y - \mathcal{Y}(p))^T \mathcal{W} \left| \frac{\partial \mathcal{Y}(p)}{\partial p} \right| \tag{5}$$

where the ! ! ! *∂y*ˆ(*p*) *∂p* ! ! ! is the Jacobian matrix, assigned as *<sup>J</sup>*; thus:

$$\frac{\partial}{\partial p}X^2 = -2(y - \hat{y}(p))^T W l \tag{6}$$

Finally, the parameter update *hGD* (for the gradient descent method), which represents the movement of the parameters in the direction of steepest descent is expressed as follows:

$$h\_{\rm GD} = \mathfrak{a}f^T \mathcal{W}(\mathfrak{y} - \mathfrak{y}) \tag{7}$$

where *α* is a positive scalar determining the length of the steps in the steepest descent direction.

The Gauss-Newton method is used for minimizing a sum-of-squares objective function. Typically, it is much faster than gradient descent methods for moderately sized problems. Let us assume, that the function may be locally approximated using the first-order Taylor series, as follows:

$$\mathcal{G}(p+h) \approx \mathcal{G}(p) + \left| \frac{\partial \mathcal{G}(p)}{\partial p} \right| h = \mathcal{G}(p) + fh \tag{8}$$

using the approximation *y*ˆ(*p* + *h*) ≈ *y*ˆ(*p*) + *Jh* into Equation (3):

$$X^2(p+h) \approx y^T \mathcal{W} y + \hat{y}^T \mathcal{W} \hat{y} - 2y^T \mathcal{W} \hat{y} - 2(y-\hat{y})^T \mathcal{W} \mathcal{I}h + h^T \boldsymbol{I}^T \mathcal{W} \mathcal{I}h \tag{9}$$

which can be rewritten as a normal equation for the Gauss-Newton formula:

$$\left[J^T \mathcal{W} J\right] h\_{\rm GN} = J^T \mathcal{W} (\mathcal{y} - \mathcal{Y}) \tag{10}$$

It is important to mention, that for both, the gradient descent and Gauss-Newton methods, the right-hand side vectors in normal equations, accordingly Equations (7) and (10), are identical.

Therefore, the L-M algorithm adaptively varies the parameters between the gradient descent and the Gauss-Newton methods. The L-M formula can be expressed as follows:

$$\left|J^T \mathcal{W}J + \lambda I\right| h\_{\mathbb{GN}} = J^T \mathcal{W}(\mathcal{y} - \mathcal{Y}) \tag{11}$$

where *λ* is the damping parameter, I is the identity matrix and the *hLM* is the parameter update for the L-M method. If the values of *λ* are normalized to the values of *JTW J*, then the L-M formula for non-linear least squares looks as follows:

$$\left[\boldsymbol{J}^{T}\boldsymbol{\mathcal{W}}\boldsymbol{J} + \lambda \text{diag}\left(\boldsymbol{J}^{T}\boldsymbol{\mathcal{W}}\boldsymbol{J}\right)\right]\boldsymbol{h}\_{\text{GN}} = \boldsymbol{J}^{T}\boldsymbol{\mathcal{W}}(\boldsymbol{y} - \boldsymbol{\hat{y}})\tag{12}$$

The L-M method is used to solve some non-linear least squares problems. In the TEAC software, the L-M algorithm was used during the ANN training process, allowing for heating demand predictions. The heating demand of a building is a complex and multilayered issue, for analyses of which the L-M method is appropriate.
