**1. Introduction**

In today's world, supplying energy is done through various carriers such as oil and gas (and products derived from them), electricity and renewable energy. Given the limited resources of energy and the population growth, the increasing annual consumption of energy a ffects life, the economy, the environment, politics, and so on. So, managing energy is a complicated task and has become an important issue in the modern world. The home section has the largest share of energy consumption in most countries. As each house has its own behavior, energy consumption patterns rely on several factors. Hence, decision making concerning the domestic sector's energy managemen<sup>t</sup> and e fficiency requires taking advantage of modern science capabilities to manage energy e fficiency and consumption.

Data mining science can extract useful knowledge which is hidden in the data. Using its methods and algorithms, data mining techniques analyze huge amounts of data automatically [1,2]. In general, data mining is the process of Knowledge Discovery in Databases (KDD). Knowledge obtained from this fashionable science can be verified by conventional analysis [3]. As this science proves its potential in solving versatile problems [4], using it to extract knowledge in domestic buildings for managing energy is reasonable.

Scholars aim to develop innovation changes in the field of energy, so many studies have been conducted on the use of data mining science in energy managemen<sup>t</sup> and e fficiency. Energy performance certificates (EPCs) measure energy performance and give recommendations on how to improve energy efficiency [5,6]. They try to find ine fficient building properties and improve them. In other words, they help to locate properties whose energy performance is e ffective and better them to improve their energy e fficiency [7]. Pasichnyi et al. review existing applications of EPCs and present a method of EPC data quality assurance using data analytics [8]. Developments on EPCs have been done using data mining in recent years [9,10].

Some important insights related to the energy managemen<sup>t</sup> requirement for buildings to save energy have been presented in [11,12]. Also, data mining tasks that can be used to mine building-related data have been shown in [13]. There are studies which focus on discovering factors which influence energy. Yan analyzed the impact of psychological, family and contextual aspects on residential energy consumption which indicate saving money, energy concern, and behavioral barriers which have a major impact on residential energy consumption behavior [14]. E ffective factors on home electrical energy demand have been analyzed through developing a model using series prediction methods. The result demonstrates that houses using pool pumps and ducted air-conditioning have an increased electricity consumption, whereas houses with gas hot water systems have a lower power consumption than homes that do not use these systems [15]. The adaptive neuro-fuzzy inference system (ANFIS) has been used to discover major factors influencing energy consumption. This indicates that insulating materials are the most important parameters in building energy consumption. Attributes such as the type of materials and their thickness, wall structures, roofs and their ability to stay hot or cold, the location of walls and windows and geographic area have a major impact on energy saving [16]. The use of unsupervised learning has been applied to discover electricity consumption patterns in a Spanish public university. The authors found di fferent clusters in which several buildings were identified. Such clusters were interpreted and rules for saving energy were proposed [17].

Clustering data is the process of putting data in a group so that they have the greatest similarity and are very dissimilar from data from other clusters. Many studies have been done in clustering [18–21]. Clustering is also used in the field of energy. The dataset has been classified into low, medium and high energy demand categories to show the factors influencing heating and hot water. A detailed analysis was performed using the k-means algorithm in the high consumption category. The output model presents good energy demand patterns and optimal ways to design buildings. The average U-value of the opaque envelope followed by the aspect ratio is the most important variable [22]. Characteristics of energy consumption examined have used cluster analysis for 134 LEED-NC certified o ffice buildings. The buildings gathered into three clusters (low, medium and high) are very di fferent and each one has a special attribute. The lower U-value of the roof and a lower ratio of windows are the factors which most influence a lower consumption. The HVAC system has a similar performance in all the clusters. The internal process load has a significant impact on clusters [23]. A framework based on data mining used CART classification and K-means clustering to analyze the pattern of energy consumption in a large data set of flats. Four influencing attributes (aspect ratio, U-value of vertical opaque envelope and windows, the average global e fficiency of the system for space heating and DHW) were analyzed. High consumption flats were clustered and a reference flat was identified. These can be used to propose di fferent energy retrofit actions [24].

The consumption of electricity and heating in six schools was studied using k-means clustering and self-organizing maps to evaluate energy e fficiency. The schools have di fferent construction years, areas, numbers of students and heating systems. Schools 1 and 4 have the highest cost of energy on working days and schools 2 and 3 have the lowest cost of energy at weekends compared to the others. The newest schools are generally better than the old ones in the field of energy e fficiency [25]. A study of energy e fficiency in 132 countries was estimated using a Data envelopment analysis model and then K-Means clustering, which specifies whether countries in a cluster are in the field of development or not. The results show that countries could develop energy e fficiency by changing energy-related indicators [26]. A cluster analysis was used to analyze the regulations on energy efficiency of buildings of South America and Europe. This showed that buildings located in a similar climate zone but in di fferent regions (countries) have di fferent energy performances. It indicated that the tendencies of energy performance are di fferent between various countries' regulations and the climate zones. The results confirmed the ability of cluster analysis to highlight similarity patterns between various regions of the same climate [27]. In the field of energy managemen<sup>t</sup> systems, ISO 50001:2011, a systematic approach to improve the energy performance, plays an important role in the energy field. A study which classified, gathered, clustered and then applied data analysis techniques showed strategic decisions for improving the energy performance. The idea is used in an oil refinery and outputs of better energy managemen<sup>t</sup> are shown [28]. Also in [29,30], e fforts were made to develop the energy e fficiency in industrial buildings. A fuzzy clustering technique was developed to rate school buildings in Greece. The methodology demonstrates that the energy consumption and global environmental quality of school buildings can be significantly improved, but the indoor air quality of these buildings causes some problems for them [31]. Wind is an energy source whose identification and assessment in its training needs is very important. The Analytic Hierarchy Process has been used to specify the training of wind farm employees. The results of the research prioritized the tasks and appropriate training courses tailored to the indicators were provided [32].

Discovering the rules is very much challenged in data mining [33–36]. Yu et al. present associations between building operational data. The methodology used on HVAC system data o ffers some "if-then" rules that are useful in the energy conservation field. Finding faulty equipment and repairing it, offering cost-e fficient conservation strategies and a better understanding of building operation are suitable solutions for energy saving [37]. In another research work, the geographical and temperature variables in the electricity energy consumption were analyzed. Energy consumption and monthly average temperature data were clustered using K-means and then the Apriori algorithm was employed to discover association rules. These made "if-then" rules to describe the influence of di fferent regions and physiographic objects. It shows that the most important parameters to increase electricity consumption are highways and then the ground, whereas rivers and farms (natural elements) decrease electricity consumption [38]. A combined framework using clustering and association rules developed to discover unusual energy used patterns. Benchmarking the rules identifies di fferent waste patterns for di fferent lifestyles [39]. A multi-objective algorithm is proposed to mine rules without a need to determine a minimum support threshold and a confidence threshold. This algorithm was used in three di fferent datasets and it demonstrates its ability to mine quantitative association rules [40]. A hybrid algorithm including the genetic algorithm and particle swarm optimization algorithm was used to discover rules in continuous numeric datasets. It shows its ability in five di fferent numerical interval datasets compared to other algorithms [41].

Due to irregular growth in the energy consumption of homes, analyzing their energy e fficiency homes is an unavoidable study. Each building has its unique attribute, function and energy-related behavior to improve the energy e fficiency of buildings. It is necessary to identify which factors and properties influence it, considering the unique behavior of homes. Analyzing them together leads to a tendency to pay attention to certain information while ignore others, and the findings are applicable to fewer buildings. This article proposes a hybrid approach that includes clustering and decision trees to identify factors a ffecting energy e fficiency and consumption in residential buildings, as well as the reduction of the loss of some important data. The idea is that by clustering houses and putting similar patterns in a cluster, and then analyzing the factors in each cluster separately, findings will be extracted with more detail and accuracy than by not using the approach and analyzing them all together.

The rest of this paper is as follows: The next section provides the methodology used and the approach presented. In order to set forth the paper's purpose and also to examine the ability of the new approach, data analysis is done once without using the hybrid approach, provided in Section 3, and then again using the proposed approach in Section 4. Data clustering and also modeling each cluster separately will be done in this section. The evaluation and deployment are described in Section 5 and the conclusion is presented in Section 6, along with a discussion of the findings.

#### **2. Methodology and Approach Presented**
