**An Approach to Data Acquisition for Urban Building Energy Modeling Using a Gaussian Mixture Model and Expectation-Maximization Algorithm**

**Mengjie Han <sup>1</sup> , Zhenwu Wang <sup>2</sup> and Xingxing Zhang 1,\***


**Abstract:** In recent years, a building's energy performance is becoming uncertain because of factors such as climate change, the Covid-19 pandemic, stochastic occupant behavior and inefficient building control systems. Sufficient measurement data is essential to predict and manage a building's performance levels. Assessing energy performance of buildings at an urban scale requires even larger data samples in order to perform an accurate analysis at an aggregated level. However, data are not only expensive, but it can also be a real challenge for communities to acquire large amounts of real energy data. This is despite the fact that inadequate knowledge of a full population will lead to biased learning and the failure to establish a data pipeline. Thus, this paper proposes a Gaussian mixture model (GMM) with an Expectation-Maximization (EM) algorithm that will produce synthetic building energy data. This method is tested on real datasets. The results show that the parameter estimates from the model are stable and close to the true values. The bivariate model gives better performance in classification accuracy. Synthetic data points generated by the models show a consistent representation of the real data. The approach developed here can be useful for building simulations and optimizations with spatio-temporal mapping.

**Keywords:** gaussian mixture model; Expectation-Maximization; urban building energy modeling; data acquisition
