*Validation of the Defined ANN*

The data used for the ANN training process were divided into three groups of samples: training, validation and testing sets, in constant shares of respectively 70, 15 and 15%. Training data, which is unknown for the network, is used to test the predefined network (adjusted according to its error) and measure its' performance. Three networks were trained, for monthly, daily and hourly predictions. In Figure 4 the training results are shown; the graph shows a regression plot for the test group of samples. A good match was observed for hourly study, where the correlation coefficient (R) equals 0.9083, while for daily and monthly studies R equals 0.9958 and 0.9838 accordingly.

**Figure 4.** Regression plots for the test data of the ANN analysis for different calculation periods: (**a**) hourly, (**b**) daily and (**c**) monthly.

The definition of *ρ* for a population of random variables (*X,Y*) can be described as follows:

$$\rho\_{X,Y} = \frac{cov(X,Y)}{\sigma\_X \sigma\_Y} \tag{13}$$

where *cov* is the covariance, *σ<sup>X</sup>* is the standard deviation of *X* and *σ<sup>Y</sup>* is the standard deviation of *Y*. Using relationship (14),

$$cov(X, Y) = E[(X - \mu\_X)(Y - \mu\_Y)]\tag{14}$$

Equation (13) can be rewritten as follows:

$$\rho\_{X,Y} = \frac{E[XY] - E[X]E[Y]}{\sqrt{E[X^2] - \left(E[X]\right)^2}\sqrt{E[Y^2] - \left(E[Y]\right)^2}}\tag{15}$$

where *E* is the expectation, *μ<sup>X</sup>* is the mean of *X* and *μ<sup>Y</sup>* is the mean of *Y*.

Finally, when applied to a sample, the *R* is commonly represented by *rXY*, and for a given paired data (*xn,yn*) is defined as:

$$r\_{XY} = \frac{\sum\_{i=1}^{n} \left(\chi\_i - \bar{\boldsymbol{x}}\right) \left(y\_i - \bar{\boldsymbol{y}}\right)}{\sqrt{\sum\_{i=1}^{n} \left(\chi\_i - \bar{\boldsymbol{x}}\right)^2} \sqrt{\sum\_{i=1}^{n} \left(y\_i - \bar{\boldsymbol{y}}\right)^2}}\tag{16}$$

The predefined ANN is a key module of the TEAC software. It allows to predict a heating demand of the examined region, consisting of RSFH of Poland. Due to the application of AI, it is possible to predict the heating demand of an urban area almost effortless, using only a sequence of data lines, describing the analyzed neighborhood. Then, the obtained heating demand, simultaneously with the electricity demand, are furtherly used as a basis for further analysis of the cluster. Also, it is important to mention, that heating demand is the main component of the whole energy consumption of the reference residential buildings in Poland. Thus, predicting heating demand (obtained using the data describing the built environment of the examined cluster) by means of the ANN is huge facilitation for those types of study.

An exemplary line for a single building is presented below. Each color represents one group of parameters, accordingly: **orange**—calculation step, **green**—localization (exterior climate), **blue**—building location, black—building enclosure variant, and **yellow**—heating system variant. The definition can be performed using some keywords (first line) or using the actual values describing the building. The TEAC software is using the building's coordinates in order to load the analyzed object parameters (buildings placement is defined earlier in 1st module of the software). In order to perform urban-scale analyses, all buildings within the examined region must be described using the mentioned formula in the exact order. Using that type of data, the heating demand predictions of the whole urban-scale region can be performed. The whole process is described in detail in [43].
