*5.1. Assessment*

Phase 5 of the methodology measures the models evaluated in the previous sections and the accuracy of modeling. Using training and testing sets is an important step to evaluate the decision tree accuracy. The higher the accuracy of the model means that its performance is better. So, the data were divided into two groups (training and testing set). 70% of the records were taken as the training set and the remaining 30% as the testing set. In addition to accuracy, the lift criterion, which indicates the degree of correlation, was calculated for the C5.0 tree branches. If its value is greater than 1, this means a positive correlation. For all the branches presented in Tables 2 and 4, the lift value was more than one, hence indicating a positive correlation.

As stated, the purpose of the proposed approach is to achieve results with more precision and details. Therefore, the accuracy of the models of Sections 3 and 4 should be compared in order to assess the efficiency and effectiveness of this approach. In Table 6 the accuracy of the C5.0 tree in all the data and in each cluster is provided. As can be seen, the accuracy of modeling using the proposed approach (modeling in each cluster separately) is greater than analyzing all the data together. This means that the presented approach exposed unknown information more accurately and its tendency to a majority of records has declined.

Since this classification is of a multi-class type, the target feature (energy efficiency group) has six different modes ((A and B), C, D, E, F and G), the percentages obtained being acceptable. If the accuracy were by chance, as the target field has six different states, the accuracy of the tree would be about 17% (1/6). But Table 4 offers percentages other than this. The percentages indicate that the accuracy of the algorithm in the analysis of each cluster separately is greater than the total analysis of the data. This approach is capable of providing knowledge and discovering unknown information more accurately.

> **Table 6.** Accuracy of the C5.0 Algorithm in the entire data and also in each cluster.


Data clustering is also evaluated through the silhouette index (Table 3). Based on the values of this index in di fferent scenarios and the analysis of characteristics in di fferent values of k, 3 clusters were selected as the most suitable number of clusters (mentioned in Section 4.1)

#### *5.2. Deployment and Discussion*

Leading research has been conducted to explore the factors a ffecting energy e fficiency in residential homes. To this end, a hybrid approach was proposed to reveal findings with greater detail and accuracy. This section reviews the findings. These sugges<sup>t</sup> that some were commonly found in modeling without using the proposed approach and using it. These findings are as follows.


The interesting point is that the proposed approach, in addition to the results described above, could also reveal new findings. In fact, this approach o ffers more detailed results (Table 7). A scrutiny of the data through the proposed approach provides new findings, as follows.

• In homes with similar attributes, not using this tari ff has resulted in a better energy e fficiency group. However, the electricity tari ff 7 has di fferent e ffects in each cluster. An analysis of the approach presented shows that in the low-consumption cluster, the energy e fficiency is poor, particularly in small (less than 51 m2) and old houses which do not use this tari ff.

It was especially seen in the medium-consumption cluster that big (over 151 m2) and old houses which use this tari ff have a better energy e fficiency.

• The building structure influences di fferent e ffects in the proposed approach. In the mediumconsumption cluster, flats have a more favorable energy e fficiency group than other structures, even among the newly built ones. Also, old homes which are structured as detached have a good energy e fficiency group in this cluster. On the other hand, it has been seen before that old houses do not have a good energy e fficiency.

In the high-consumption cluster, buildings with mid-terrace and end-terrace structures which have installed boilers belong to a better energy e fficiency group.

• In the high-consumption cluster, homes in Wales have better energy e fficiency than homes with similar attributes but which are in di fferent areas.

Table 7 shows the findings, analyzing the entire dataset and each cluster separately. New findings extracted which are obtained from the proposed approach are shown as new finding. This suggests that the approach presented can discover findings in more detail and this proves the ability and usefulness of this approach in discovering unknown information. These detailed findings can be very helpful for policy maker, architects, and engineers.


#### **Table 7.** Assessing the results of the proposed approach.
