*2.2. The Technical Systems*

The swimming facility at Jøa is a state-of-the-art swimming facility which complies with the Norwegian passive house standard [22]. It includes a ventilation heat recovery system equipped with a heat pump, as recommended in the literature [5,43], and conventional water treatment, which research has found to be the most effective water treatment train [44].

#### *2.3. The Dataset*

The dataset ranges from November 2017 to June 2019 and is separated into two parts. The training dataset and the validation dataset are, respectively, from November 2017 to June 2018 and September 2018 to June 2019. The size of the datasets was decided based on three main factors: (1) The training dataset should not be too large, due to the purpose of the study; it should be a quick and easy to implement a dynamic energy benchmark for swimming facilities. (2) The validation dataset should be large enough to cover all the seasons and several operation disruptions. (3) It should be preferably based on continuous operation data, without including lockdowns for maintenance.

#### *2.4. The Variables*

The objective of the study is to predict the energy use (dependent variable) as a result of several independent variables. The selected independent variables used in this study are listed in Table 1.

**The dependent variable** was defined by applying the energy conservation Equation (1) at the boundary defining the swimming facility as presented in Figure 1.

$$\frac{dE\_{\text{net}}}{dt} = \mathcal{E}\_{\text{net}} = \mathcal{E}\_{\text{ca}} + \mathcal{E}\_{\text{ta}} + \mathcal{E}\_{\text{cp}} + \mathcal{E}\_{tp} \tag{1}$$

where *E˙ net* is the net delivered energy to the facility, *E˙ea* is the delivered electricity to the air handling unit, *E˙ta* is the delivered thermal energy to the air handling unit, *E˙ep* is the delivered electricity to the pool circuit and *E˙tp* is the delivered thermal energy to the pool circuit. The units for the variables are given in Table 1.

**The independent variables** were defined as the meteorological data, ambient air temperature and relative humidity and the usage data. This choice was due to the availability in the respective building and to the known correlation between energy use and outdoor climate [45] and user interference [7,36,45]. In addition, this group of indicators is represented as logged values in conventional building automation systems (BASs). Due to the

highly insulated building envelope and the orientation of the façades, the assumption of negligible effects of wind pressure and solar radiation was applied.

The dataset was created by:


Due to implications within the BAS, extracting data prior November 2017 was not possible. In addition, only a limited part of the variables was logged in June 2018. Table 1 summarizes the variables in the dataset, the units and the origin of the data.


**Table 1.** The selected variables that have been used in the analysis.
