**3. Results**

We utilized the framework explained in Section 2 to record and label the Fully-labeled hIgh-fRequency Electricity Disaggregation (FIRED) dataset which was first introduced by Völker et al. in [17]. Version 2 of the FIRED dataset extends its time period to 101 days. The data includes aggregated three phase current and voltage measurements sampled at 8 kHz as well as 21 time synced individual appliance measurements sampled at 2 kHz from a residential apartment in Germany. Furthermore, it includes sensor readings such as room temperatures and additional state information of certain appliances and each light bulb in the apartment. Annoticity has been used to fully label all state changes of the sub-metered appliances over a time period of two weeks.

The FIRED dataset was collected at an apartment building constructed in 2017 with seven apartments on four floors. Data was collected from a three-room apartment with 79 m<sup>2</sup> of space (open combined kitchen and living room, bedroom, child's room partly used as office, hallway, bathroom, and storage room). The apartment is inhabited by two adults and one child. The building is heated via a district heating and most rooms are equipped with air filters with built-in recuperators. According to the building's energy certificate, it requires a primary energy consumption of 12 kWh/(m2a). The apartment's power grid is a three-phase 50 Hz system consisting of *L*1, *L*2, *L*3, and neutral (*N*) wires. *L*1–*L*3 has a phase shift of 120°. Access to the apartment's electrical system is given through a fuse box located in the hallway. All lights installed in the apartment are off-the-shelf smart light bulbs with a built-in *ZigBee* module. This allows the lights to be turned on or off via a smartphone application, voice assistant, or regular wall-light-switch. It further allowed to

log all state changes during the recording of the dataset as explained in Section 2.3. The washing machine, dryer, and freezer are located in the basement of the building and are not part of the recording.

A SmartMeter (see Section 2.1) was installed in the apartment's fuse box. Split core current transformers were attached to the three incoming supply legs. For voltage measurements, *L*1, *L*2, *L*3, and *N* were connected in parallel. The meter is supplied with power by an additional *L*1 leg that is secured by a separate 16 A fuse. The final installation is shown in Figure 6 (left).

We further deployed 21 PowerMeters (see Section 2.2) in the apartment and connected them to WiFi. We further checked that the WiFi signal quality (*RSSI*) of each PowerMeter exceeded −60 dBm to be certain that data could be sent flawlessly. Some devices like the oven and the exhaust hood were directly connected to the mains. To measure those appliances, we connected a special version of our PowerMeters with screw terminals. Figure 6 (right) shows two PowerMeters connected to the espresso machine and coffee grinder.

Modern households can easily include more than 40 appliances (68 in the FIRED household). Many of these devices are only plugged in occasionally and sometimes at a different socket than before. Therefore, connecting a continuously sensing meter to each appliance is infeasible. Instead, we connected devices of the same category (e.g., routers) or devices which are only used simultaneously (e.g., monitor and PC) to the same PowerMeter. Devices which are only plugged in occasionally and typically not at the same time (e.g., mixer or vacuum cleaner) were connected to a dedicated PowerMeter (*pm11*). If an appliance was connected or disconnected, a corresponding entry was manually added to a log file. This means that, even if the socket was continuously sampling data, the appliance connected to it changed.

Moreover, temperature and humidity sensors were installed in most of the rooms, a ZigBee logger was set up, and both a 433 MHz and an infrared bridge were installed (as described in Section 2.3).

To properly connect all individual sensors to a central recording PC, the apartment was equipped with four WiFi access points. The power consumption of the recording PC and access points were recorded individually and also contribute to the apartment's aggregated consumption. The recording PC gathered the data of all electricity meters and sensors, stored these into files frequently, and pushed the files to a cloud server for persistent storage.

**Figure 6.** (**Left**) SmartMeter installed in the apartment's fuse box. (**Right**) PowerMeters with ID 13 and 15 connected to the coffee grinder and the espresso machine.

## *3.1. Data Records*

The provided data include voltage and current measurements at high sampling rates taken from the aggregated mains' signal and 21 individual sockets. Furthermore, FIRED contains per day and device summary files with derived power measurements to ge<sup>t</sup> a quick insight into the data. The root directory of the dataset contains folders with the *raw* and *summary* data. The data are stored as multiple *MKV* container into sub-folders named *powermeter<ID>* and *smartmeter001*, respectively. We used multimedia containers to store the data, as we previously explored their benefits in [30]. While being optimized for audio or video streams, these containers allow for storing regularly sampled sensor data as time synced audio streams. Text based labels can further be stored as subtitle streams in the same file. The FIRED data of each metering device are stored as a single *WavPack* [31] encoded audio stream inside the multimedia container. Each stream has multiple channels for the voltage and current signals. As stated in [30], WavPack allows for a lossless compression while maintaining high compression rates for time series data. In particular, we achieved a compression ratio of 1.46 for the voltage and current measurements using WavPack (only 1.42 could be achieved with *hdf5* [32], which has been used e.g., for UK-DALE [11] and BLOND [12]). We also store different Metadata for each of the streams including the start timestamp (with microseconds resolution), the particular meter used, the sampling frequency, codec information, the name of the measured attributes, and the stream's duration. Therewith, each file is self descriptive and can be used without prior knowledge. File size and, therewith, file loading times are kept reasonable by splitting all files at regular time intervals. The local time of the first sample is appended to each filename as <*year*>\_<*month*>\_<*day*>\_\_<*hour*>\_<*min*>\_<*sec*>.

Table 2 shows the mapping of the recorded appliance to the used PowerMeter (ID). For more information about the appliance, its brand and model are shown with its power rating ( *P*) according to the device manufacturer as well as the average *P* and maximum power *Pmax* observed during recording. Φ shows the live wire (*L*1, *L*2 or *L*3) to which the PowerMeter was connected. A complete list of all appliances in the apartment is part of the dataset. It contains additional information and website links for each appliance.

#### 3.1.1. Voltage and Current Data

All PowerMeters sampled the current and voltage waveforms at a rate of 2 kHz. While in theory data can be sampled and sent at up to ≈8 kHz using our PowerMeters, the available WiFi bandwidth limits the amount of data that can be sent simultaneously by all meters. Therefore, we chose a sampling rate of 2 kHz as a trade-off between reliability and temporal data resolution (see Section 3.3.3 for more information). The SmartMeter recorded voltage and current waveforms of *L*1, *L*2, and *L*3 with a sampling rate of 8 kHz. The ADC installed in the SmartMeter allows for sampling these waveforms with a rate of up to 32 kHz, but, again, we preferred a higher reliability over a better time resolution. This is in line with Armel et al. [7] who stated that sampling rates between 15 kHz to 40 kHz will not improve the performance of NILM algorithms as higher frequency signal components are distracted by noise in real buildings.

Each file contains 600 s of data stored into a single audio stream inside the multimedia container. For the aggregated data, each audio stream has six channels (*v\_l1*, *i\_l1*, *v\_l2*, *i\_l2*, *v\_l3*, *i\_l3*) representing the current and voltage waveforms for the three supply legs. The audio streams for the individual appliance data contain two channels (*<sup>v</sup>*,*i*). The number of samples in each file should match the time distance to the next file. If this is not the case, no data is available for this particular meter during this time period. This occurred during a reliability reset each day at midnight and rarely for single meters due to occasional data loss as depicted in Section 3.3.3.

Data pre-processing is typically not required, as both the SmartMeter and all PowerMeters calculate the physical quantities (*Volt* for voltage and *Milliampere* for current measurements) from the raw ADC samples.

**Table 2.** Appliances recorded via PowerMeters. *ID* represents the identifier of the PowerMeter used for recording. For *PowerMeter11*, the connected appliance changed during recording. *P* is the power according to the device manufacturer, Φ is the *Live Wire* the device is connected to (*L*1, *L*2 or *L*3), *Pmax* is the maximum average power drawn for the duration of one second and *P* is the per day average power seen during the recording. The unit of all power measurements is Watts.


The provided voltage and current data without any pre-processing can be seen in Figure 7. Plots 1–3 show data of the SmartMeter while plots 4 and 5 show the simultaneous measurements of two additional PowerMeters. The figure does not only highlight the high temporal resolution of the data but also the achieved clock synchronization. The rush-in current shown in the PowerMeter data (Figure 7 plot 4) matches the rush-in current seen in *L*3 of the SmartMeter (Figure 7 plot 3). A time shift between the measurement devices of around 10 ms can be observed. Even after 16 h of continuous recording, the offset between the SmartMeter and PowerMeters is below one mains cycle, highlighting the effectiveness of the realized clock synchronization (see Section 3.3.4).

**Figure 7.** Voltage (red) and current (blue) waveforms of the SmartMeter, powermeter15 and powermeter27. The recording was taken on 9 June 2020 at around 4:00 p.m. The same appliance switch-on event of the espresso machine is visible in the recording of L3 of the smartmeter and of powermeter15.

## 3.1.2. Power Data

The power data was derived from the voltage (*V*) and current (*I*) waveforms. We calculated active, reactive, and apparent power from the raw voltage and current data of all recording devices. The data is stored as a single file for each day of the recording. The formulas used to calculate the individual powers based on the mains frequency *fl* = 50 Hz are shown in (4), (5), and (6), respectively:

$$P(n) = \begin{array}{c} \frac{1}{N} \cdot \sum\_{i=0}^{N-1} V(i) \cdot I(i) \end{array} \tag{4}$$

$$S(n) = \quad I\_{RMS}(n) \cdot V\_{RMS}(n) \tag{5}$$

$$Q(n) = \quad \sqrt{S(n)^2 - P(n)^2} \tag{6}$$

*P*, *Q*, and *S* are calculated for each non-overlapping window *n*. The length of the window is *N* = *fsfl* . *fs* is the sampling rate of the voltage and current measurements. For the 8 kHz, SmartMeter data *N* is 8000 Hz 50 Hz = 160. *IRMS* and *VRMS* are calculated as follows:

$$I\_{RMS}(n) = \sqrt{\frac{1}{N} \cdot \sum\_{i=0}^{N-1} I(i)^2} \tag{7}$$

$$V\_{RMS}(n) = \sqrt{\frac{1}{N} \cdot \sum\_{i=0}^{N-1} V(i)^2} \tag{8}$$

Since data of commercial smart meters have a sampling frequency of 1 Hz to 0.01 Hz, an additional 1 Hz version of the power data is provided.

The 50 Hz and 1 Hz power are stored for each meter individually and contain one day of data. Times for which the power could not be calculated as no voltage and current data being available are marked with a power of constant zero to maintain an equidistant time period between samples.

Figure 8 shows the single day apparent power consumption of the apartment. The contribution of the six appliances which consumed the most power on this day is shown as individual colored blocks. All other appliances are summed and plotted as *Others*. The aggregated power consumption of the SmartMeter is shown as *mains*. Ideally, the superposition of the apparent power of all individual meters should match the aggregated apparent power. Nevertheless, a small margin can be observed in Figure 8. This gap is caused by hard-wired appliances such as lights and the ventilation system, which are not individually monitored (see Section 3.3.2 for more information).
