**2. Materials and Methods**

Before the main objective of the study was achieved, analyses were carried out to establish a potential list of explanatory variables (conditional attributes). During the research, a database was created covering 109 buildings from the end of the last century that were thermal improved in the years 2010–2015. These buildings had energy audits prepared, on the basis of which the optimum variants of thermal modernization were selected, the partitions that should be modernized were indicated, and the appropriate thicknesses of layers of thermal insulation materials were selected. The analyzed buildings were described with many parameters.

For experimental reasons, most relevant characteristics have been selected. Some of them are measured and others calculated, as pointed out in Table 1.


**Table 1.** Characteristics of the selected values that influence the reduction of annual energy demand for buildings subjected to thermal efficiency improvement.

The average value of parameters describing the examined buildings is comparable to the data contained in the building typology "TABULA" for Poland. These parameters are typical for an "average building" of a multi-family residential building built in 1967–1985 [46,47]. Therefore, the surveyed group of buildings can be considered as representative.

The data, selected after preliminary selection, were used to build sets of input variables based on which the usefulness of a method based on rough set theory (RST) for estimating the energy consumption of a building after the performance of thermal modernization measures in it was checked. These variables were used to develop four sets of input data with di fferent degrees of influence on energy consumption and di fficulty in obtaining them, which are summarized in Table 2. A very limited set of indicators has been selected for the first set of variables explaining changes in energy consumption for heating a building after its thermal modernization. It included the amount of demand for thermal power to heat the building before the thermal modernization, the actual unitary indicator of final energy consumption for heating, and information about improvement actions taken—that is, which of the partitions will be insulated. From the next set of variables, the information concerning the demand for thermal power of its heating was eliminated and replaced with information concerning characteristic dimensions of individual components of the building, that is, area of individual building partitions, area and cubic capacity of the building, shape coe fficient of buildings, and indicators characterizing a given building (number of people using the building, number of dwellings). The previous set of variables contained information which can be relatively easily obtained for any residential building, but it did not contain a very important parameter that would characterize the thermal insulation of individual partitions in the current state. Next set of variables has been supplemented with heat transfer coe fficients for individual partitions. Collecting such an extensive range of information allows for precise characteristics of the object, but requires a lot of work on its reliable preparation. Because the above set of variables was very extensive and gathering such an extensive range of data is time-consuming, in the last set of input data, only the variables having direct influence on heat losses in the building, such as heat transfer coe fficients through partitions and the fields of partition surfaces through which heat losses occur, are left. Sets of variables 2, 3 and 4 also included information on improvement measures taken—which of the partitions are to be isolated.

The groups of variables presented in Table 2, constituting conditional attributes, were used to build a model of prediction of final energy demand for building heating based on the Rough Set Theory. It was introduced in the 1980s by Professor Zdzisław Pawlak [45]. It is used as a tool to synthesize advanced and e ffective methods of analysis and to reduce datasets [48]. The rough sets serve as a methodology in the process of discovering knowledge in databases. It is a tool used to describe inaccurate, uncertain knowledge; to model decision-making systems; and for approximation reasoning [49]. The deduction methodology using the Approximate Collection Theory refers only to the qualitative nature of object characteristics. This causes limitations and di fficulties when we deal with the occurrence of features in a quantitative form, not a qualitative one. The specificity of the attributes of the surveyed buildings shows a grea<sup>t</sup> variety of ways of encoding the given characteristics, which mostly occur in the quantitative form. In this case, the integration of the valued tolerance relation proves helpful [50]. It allows to implement more flexibility in data mining into the approximate set theory and to analyze observations expressed in quantitative form. The classic assumption of RSTs is based on the concept of the indistinguishability relationship as an exact relationship of equivalence, i.e., objects will only be indistinguishable if they have similar attributes (system 0–1). The introduction of a valued tolerance relation to RST allows to determine the upper and lower approximation of the crop with di fferent degrees of indi fference ratio [51]. This allows for the comparison of two sets of data and gives a result between 0 and 1, which is the level of indistinguishability. This range is a membership function derived from the assumptions of fuzzy harvest theory. The closer the result is to one, the more similar (indistinguishable) the objects are in terms of the analyzed attribute, and the closer to 0 the more distinguishable they are [50,51]. Detailed description of the prediction model based on quantitative variables has been presented in [51,52]. The general course of the construction of the model using the approximate sets theory is presented in Figure 1.

**Table 2.** Sets of input variables for analyzed predictive models. Sets of variables (before thermal modernization) (Recorded in the form of 0–1 information—whether the peak wall, external wall, floors, ground floors, windows, and flat roof are to be thermal modernized).


**Figure 1.** Diagram for building of a model of inference based on the core of a set of conditional attributes using the theory of rough sets.

After selecting the possible list of independent variables (conditional attributes), the developed database was divided into a didactic set, to which 80% of the tested buildings were randomly selected, and a test set created from other objects. A schematic view of the work areas with individual blocks, illustrating the methodology for determining the energy demand indicator for heating after improvements based on the inference model built, is presented in Figure 2.

The evaluation of the quality of the developed model was assessed for individual groups of variables. For assessment of past due forecasts, the mean absolute error (MAE), mean absolute percentage error (MAPE)—also known as mean absolute percentage deviation (MAPD) [53,54]—as well as mean bias error (MBE), and coefficient of variance of the root mean square error (CV RMSE), was used; these are accepted as statistical calibration standards by ASHRAE Guideline 14-2002 [14,55,56]:

$$MAE = \frac{1}{n\_{\mathcal{S}}} \sum\_{m=1}^{n\_{\mathcal{S}}} \left| \mathbf{O}\_{r} - \mathbf{O}\_{\mathcal{W}} \right| \qquad \qquad m = 1, 2, 3 \dots, n\_{\mathcal{S}} \tag{1}$$

$$MAPE = \frac{1}{n\_{\mathcal{S}}} \sum\_{m=1}^{n\_{\mathcal{S}}} \left| \frac{O\_r - O\_{pr}}{O\_r} \right| \cdot 100\% \ m = 1, 2, 3 \ldots, n\_{\mathcal{S}} \tag{2}$$

$$MBE = \frac{\sum\_{m=1}^{n\_{\mathcal{S}}} \left(O\_r - O\_{pr}\right)}{\sum\_{m=1}^{n\_{\mathcal{S}}} O\_r} \cdot 100\% \ m = 1,2,3\dots,n\_{\mathcal{S}}\tag{3}$$

$$CVRMSE = \frac{\sqrt{\sum\_{m=1}^{n\_{\mathcal{S}}} \frac{\left(O\_r - O\_{pr}\right)^2}{n\_{\mathcal{S}}}}}{\frac{1}{n\_{\mathcal{S}}} \sum\_{m=1}^{n\_{\mathcal{S}}} O\_r} \cdot 100\% \ m = 1,2,3...,n\_{\mathcal{S}}\tag{4}$$

where: *Or*—real value of the index of final energy demand for heating after modernization (FE1); *Opr*—the forecast value of the index of final energy demand for heating after modernization; and *ng*—is the number of buildings covered by the study.

**Figure 2.** View of the working space.

According to the ASHRAE Guideline, models are considered to be calibrated when MBE values are within ± 10% and CV RMSE values are within 30% [14,57].

The novelty of this research is an attempt to develop a universal model based on the rough set theory describing the final energy demand indicator for heating buildings and to use, for this purpose, groups of variables characterized by different degrees of difficulty in obtaining them. The algorithm of building a model for forecasting energy demand presented in the article allows decision-makers to assess the potential for real energy savings resulting from planned actions to improve thermal performance of existing buildings. It is important to underline the fact that, as the literature review shows, it has not ye<sup>t</sup> been used in the energy assessment of buildings. The presented method based on Rough Set Theory (RST) should not be considered as competition for statistical analysis, but as an optional choice of method for data analysis. Bearing in mind that a common disadvantage of using classical statistical and data-based analyses is their time-consuming, costly nature (equipment and the collection of a sufficient, generally large number of representative observations), and the grea<sup>t</sup> complexity of the procedures used, which consists of so-called preliminary analyses (i.e., checking the assumptions of randomness of variables, examining the probability distribution and correct interpretation of statistical analysis results), as well as the ability to conduct and interpret statistical tests. In many cases, this results in the data being matched to the model, not the model to the data, as it should be in reality. When using rough set methods, the observations "speak for themselves" and are not corrected in any way, either before the application of the method or during the analysis. Moreover, the method based on RST is not limited, unlike regression models, by the number of sets of representative observations (both small and large sample of observations), nor is the construction of a statistical model required; decisions are made on the basis of dependencies: if certain conditions are met, a specific decision is taken (according to Boolean application). The presented method does not impose complicated rules of control of the features taken into account and the results of analyses. Only two main coefficients are used to control the significance of conditional attributes in relation to the decision attribute and the created decision rules: quality and approximation accuracy—easy to apply and interpret.
