1. Introduction
In recent years, to meet energy efficiency requirements, the building sector has contributed to energy-saving technologies and policies [
1]. Although energy consumption for the building segment is the same as in 2019, carbon emissions are highest (10 GtCO
2) as construction activities and electricity intensity increase [
2,
3]. In line with the Global Status Report for the building sector, from 2020 to 2030, a reduction in carbon emissions of 6% per year is demanded with respect to the building segment [
2]. Over the past decade, building benchmarking has been successfully addressed by city governments in 40+ cities in the United States to combat inefficient energy use in buildings [
4]. To reach this target, diminishing highly intensive electricity use in building operations is significant.
With the development of data mining technologies, a lot of building energy efficiency research has been conducted due to the large amount of existing power data. In this studied domain, the unsupervised data mining approach presents great advantages in terms of data mining in building operation data. In this case, regular discoveries in operation data can provide an innovative energy-saving strategy for buildings. With respect to unsupervised algorithms, the cluster and association rule mining (ARM) solutions are the most widely utilized in this domain. This paper aims to make full use of the data mining method to study building performance data. Some hidden information and issues are found by achieving the energy efficiency purpose. In line with these discovered features, some focused power-saving suggestions can be provided to run the building.
The cluster algorithm mainly aims to recognize the data variation law based on the relationship among datasets. In architectural research, the cluster method is primarily harnessed to identify building operation data, such as power load. Chicco [
5] compared different cluster performances in terms of recognizing building electricity data profiles. The results showed that the k-means indicate a better recognition capability than other algorithms. Klemp et al. adopted a k-means clustered manner, predicting the building material U-value ranges successfully [
6]. In addition, k-means has showed an excellent performance in terms of heating load classification [
7]. Miller et al. improved the k-means cluster by identifying the daily building power variation pattern [
8]. There are no limitations in terms of architectural types; the k-means algorithm has also been employed on hotel buildings [
9,
10]. Gao utilized the k-means approach, grouping twelve buildings in line with a performance characteristic [
11]. Jaeger et al. constructed a building cluster approach by hierarchically replacing the urban energy simulation flow [
12]. Andrews and Jain built a mixed dataset integrating the attributes of grid-interactive and efficient buildings [
13]. K-medoids using Gower’s Distance have been used to cluster these constructions, analyzing demand flexibility benchmarking. Walsh et al. put forward a performance-driven method for climate zoning that sought to address these limitations by leveraging archetypes, building performance simulation, and GIS to a great extent [
14]. Coupled with the k-means cluster method, the final results proved the validity of this research.
Apart from construction energy consumption, it could also be used to recognize people’s behavior. Lavin and Klabjan distinguished 1000 commercial consumer daily electricity consumption patterns harnessing the k-means solution [
15]. For the indoor environment, three performance indicators, namely temperature, humidity, and light, were regarded as the data showing the indoor circumstance and were analyzed using the k-means cluster [
16]. In general, the k-means algorithm needs to assign the k number in advance to achieve the algorithm function. To simplify this process, some improved algorithms have been developed, eliminating this step. For example, Kwac et al. put forward an adaptable k-means algorithm to determine residential buildings’ daily energy consumption patterns [
17]. In addition, focusing on non-numerical data, a fuzzy c-means clustered algorithm has also been employed to identify building patterns. With the purpose of measuring the distance computational method, which is the core of the clustered algorithm, Iglesias and Kastner compared four similarity evaluation approaches of Euclidean distance, Mahalanobis distance, Pearson correlation-based distance, and Dynamic Time Warping distance [
18]. The results showed that Euclidean distance made the highest achievement in terms of daily building electricity mode identification. Qin and Zhang used the same solution on office building energy consumption data [
19]. Santamouris et al. grouped the load data from 320 school buildings into five types according to the fuzzy c-means calculation principle [
20]. Additionally, there were some clustered solutions without using distance calculation, such as support vector machine and decision tree. Chicco and Ilie found that the support vector machine clustering method had showed a good performance when the cluster number was at a low level [
21]. Petcharat, Chungpaibulpatana, and Rakkwamsuk used a new expectation maximization clustering algorithm to investigate commercial building lighting energy consumption modes [
22]. Liu et al. proposed an innovative decision tree clustering algorithm and achieved load pattern recognition for a variable refrigerant flow system [
23]. Culaba et al. clustered time-series energy consumption using the k-means method to discover householders’ behavior differences to prepare for power intensity forecasting with a support vector machine solution [
24]. Xu et al. constructed a probability distribution model integrated k-means clustering algorithm by researching residential building electricity curves and people’s behavior [
25]. Nisa, Kuan, and Lai employed the Apriori method for water-cooled chiller data to identify related parameters for constructing a prediction model [
26].
Association rule mining (ARM) is an increasingly popular unsupervised algorithm in the big data research area. It can discover associated relationships among various items via accounting for the frequency of occurrence. The antecedent and consequence rule found constructs of the final output results. Initially, this type of solution was mainly used for making purchasing decisions for the retailing industry. However, it gradually began to be used in other fields such as the healthcare and financial industries with the development of big data technology [
27].
In the building energy-saving research field, the ARM solution has been harnessed in terms of discovering the relation among building operation data. Yu et al. employed the ARM method to mine the domestic appliance work regulation and found that there were many relationships between different activities, such as TVs and ventilators. Based on the mined information, some suggestions for energy efficiency could be proposed for residents [
28]. Yu et al. investigated critical associations for the HVAC system via the ARM algorithm [
29]. In this case, plant running fault and wastage patterns were recognized successfully. D’Oca and Hong revealed that there was a correlation between windows opening and relative behaviors using the ARM algorithm [
30]. Cabrera and Zareipour identified an education institution lighting wastage mode using the ARM solution [
31]. The found associated rules illustrated the relationship between energy consumption and various variables such as season, time, and occupancy status. Similarly, Wang and Shao focused on the lighting system, revealing the energy wastage pattern which is impacted by significant elements. With the aim of forming the regular logic of association rule mining [
32], Xiao and Fan made a framework to mine the energy wastage mode in line with ARM solutions [
33]. Then, Fan et al. developed this framework further [
34]. Under this frame, Li et al., 2017, discovered the ARM rules about the refrigerant flow system [
35]. Wang et al. studied electricity consumption modes, focusing on residential buildings, using the ARM solution [
36]. Xue et al. firstly utilized three clustered solutions of k-means, k-medoids, and hierarchical to recognize the daily operation patterns for heating in winter [
37]. Meanwhile, the ARM approach of the Apriori algorithm was taking advantage of mining energy wasting patterns. Chaobo et al. used the FP-Growth ARM method to investigate typical chiller plant running issues [
38]. Qiang and Xiaodong et al. focused on elementary school buildings to study the associated correlation between various sub-entry energies via the Apriori algorithm [
39]. Similarly, Qiang, Ying et al. analyzed small hotel time-series energy using curve features [
40]. Xue, Shu, and Da revealed that there was a relationship among socio-demographic features of occupant age, employment, and occupancy patterns [
41].
Based on the conventional ARM method, improved association rule mining has been developed gradually with the purpose of enhancing the recognition precision and utilization scope. For example, Qiu et al. proposed a new ARM solution to integrate the weight determination mechanism, increasing algorithm judgement precision [
42]. Lighting system, chiller operation, and coordinated control schemes were picked up and refined. To expand the scope of application, a quantitative ARM solution was coded for research. Fan and Xiao made a comparison between the conventional ARM and quantitative ARM solution and found that the latter method could identify numerical and categorical data information which was not limited to a specified data type [
43]. Fan et al. developed an ARM solution to focus on the temporal features of a building energy system and, in a further work, the dynamic operation pattern of HVAC work was also discovered using an improved algorithm [
44,
45]. To increase the capable of recognition, Fan and Song et al. aimed to achieve graph identification using the ARM solution [
46]. In this case, Fan, Xiao, and Song et al. put forward an image association rule mining approach successfully and raised the interpretability of mined information [
47]. Gunay, Shen, and Yang picked up text data information from the work record of HVAC to discover the component fault frequency [
48]. Unlike the aforementioned research, Zhang et al. studied an innovative post-mining method and discovered the associated rules relating to anomalies by matching the found rules [
49,
50]. In this way, some unworthiness rules for HVAC equipment energy efficiency were filtered and removed. Liu et al. integrated cluster and association rule mining solutions to mine the operation patterns of office buildings [
51].
Table 1 presents some related investigations which have occurred in recent years. It can be seen that the data mining task is the main trend within research studies currently, but fault detection could be also achieved by relative data mining algorithms.
According to the above reviews of data mining research, it can be seen that, currently, many data mining investigations have been conducted from different perspectives. However, energy time-series data are still not being paid enough attention by investigators. Time-series data hide significant information, reflecting the building power usage habit, while this habit can show several energy consumption issues, providing new solutions to promote energy efficiency from particular points of view. Therefore, in line with these found features, some energy-saving approaches could be proposed, focusing on the specific type of building. The purpose of this paper is to mine the time-series energy variation regulations via clustering and ARM solutions.
The first part of this article has introduced relevant studies which have been conducted. The second section, the Methodology, has described the algorithm running principles. In the Results and Discussion section, the consequences that we find and law are illustrated, and specific reasons are explained. Finally, the Conclusion section summarizes the whole research discoveries.
5. Conclusions
This paper aims to study restaurant energy usage patterns by means of the data mining approach. Official building models provide simulation data to make up the dataset for investigation. Time-series cluster and ARM algorithms are adopted to analyze energy consumption curves. Some conclusions have been generated, as follows:
A combination of time-series clustering and ARM algorithm work flows could successfully discover the building operation pattern. Focusing on these regulations, some more scientific energy efficiency suggestions are proposed.
In the process of investigation, the cluster method mines various energy consumption patterns over time, which reflect the building’s characteristics and problems, while the association rule-mined algorithm discovers relationships between different energy types under the same moments, which deduces the reason behind the phenomenon.
Restaurant time-series energy consumption curves could be clustered into four types: Invert U, M, Invert V, and Multiple M. Each mode has its own variation characteristics. Two aspects of intensity and peak shift are proposed for achieving energy savings, focusing on different curve modes.
In terms of the subentry energy type, cooling and refrigeration are the two most influential factors for total energy. Outside circumstances and people’s flow are the two significant factors influencing the energy usage pattern. With respect to the seasonal element, in summer, outside temperature primarily affects interior energy consumption, while in winter, human traffic becomes the major influential factor.
Regarding canteen architecture, the key to energy conservation should be how to fill up the power curve valley at off-peak times, such as by changing the refrigerator’s mode. In addition, eliminating unnecessary energy usage is another efficient way, like closing heating equipment at peak times when there are a lot of people in the winter.
Final research results establish a new workflow for relative investigations and reveal several common problems of restaurant operation that could provide references for related energy policy determination. This study analyzes simulated energy consumption data that accurately represent building operational features, but future research should focus on field measurements to identify specific irregular patterns in actual power load data and the building area impact performance. In terms of the time issue, future work should also be conducted to compare various energy data clustering results under different time resolutions.