**3. System Architecture**

The SMH consists of a first data acquisition module that automatically collects data on electrical variables (voltage, current, power factor, active power, and reactive power) every half second. This information is dumped directly into a data storage system, allowing processing without performance problems, and can be performed by several devices at once. Subsequently, this large amount of data is processed asynchronously for analysis and selection. Interesting data for the different control panels (dashboards) are then loaded in a web portal so that any user can study, analyse, or even download them. Each of these system modules is detailed below. Figure 1 shows the system architecture.

**Figure 1.** System architecture.

#### *3.1. Data Acquisition*

The flow chart in Figure 2 shows the data acquisition performed with the SMH by the Arduino Nano Rev3 (ANR3) microcontroller. The first task is the initialization of the system. Then the continuous measurement of the fundamental electrical variables (*v*, *i*) is performed using the analogue inputs A0 (*v*), A4, and A5 (*i*). Once the variables are processed, they are sent through the serial port to the Arduino Wemos D1 mini (WD1m) microcontroller and a backup copy is made to the SD memory card.

**Figure 2.** Flowchart for the measurement and computation of electric variable: ANR3 board.

Each of the tasks to be performed requires a processing time: (i) 10 cycle measurement of the input signals (200 ms); (ii) obtaining the fundamental and derived variables (30 ms); (iii) sending data to ANR3 (1 ms); (iv) storing data on the SD card (9 ms); and (v) waiting time until the next measurement (10 ms). The chosen sampling frequency is 1 kHz (1 ms). Since the measurement time is 200 ms, 200 samples are obtained in ten cycles of the measured signal. The timeline for the measurement process is shown in Figure 3. In the first part of the process, ANR3 obtains the fundamental and derived electrical variables. Then, they are sent to WD1m and finally, the data are saved on the card. In parallel, WD1m receives the data from ANR3 and uploads them to the cloud using the Wi-Fi connection.

**Figure 3.** Process timeline for the SM.

Due to the data acquisition time (0.5 s), rise times of 0.25 s are required. The free version of Firebase offers storage times that meet the above requirements, with storage times of 0.1 s. Figure 4 shows the flow chart of data uploading to the cloud. The process is performed continuously by WD1m with the following steps: (i) system initialization; (ii) reading of the serial port; (iii) uploading to the cloud; (iv) and confirmation of the data upload.

**Figure 4.** Flowchart for cloud data uploading: WD1m board.

The timeline for the process developed by WD1m is shown in Figure 3. The execution times are: (i) serial port data reading (1 ms); (ii) data upload to the cloud (150 ms); (iii) data upload confirmation (50 ms); (iv) and timeout (49 ms).

#### *3.2. Data Processing*

Huge amounts of data are generated every millisecond from thousands upon thousands of connected devices. This data, which constantly appears in the cloud, contains potentially great business value. For this reason, we need to perform effective data processing. In our system, once the data are obtained and stored in a "Not Only SQL" (NoSQL) database, it must be analysed and processed to obtain the maximum information possible. NoSQL is to talk about structures that allow us to store information in situations where relational databases generate certain problems. These issues are mainly due to the scalability and performance problems of relational databases, where thousands of concurrent users log on and millions of queries per day occur.

At this point, and when dealing with an information system that stores a huge amount of data, one of the first doubts is whether the data and information are all the same? Let us start by defining data as a symbolic representation of some situation or event, without any semantic sense, describing a concrete fact. However, we define information as a set of data that are properly processed so that they can provide a message that contributes to decision making when solving a problem or facing a situation in which any kind of decision making is required.

The main objective of this module is to move from that large amount of data obtained from the sensors of the data acquisition module to information that the end user can understand and process and that is oriented to decision making, among other things. The presented system automatically extracts and from time to time configures the data stored in the NoSQL system; it processes, filters, and loads them into the data system of the visualization platform described below.

### *3.3. Dashboard Design*

The last module of the system, which serves the end user in visualizing aggregated information in various ways, is a set of dashboards with different purposes.

Dashboarding is a dynamic and purpose-based visual interface needed to display one-to-many database linkages so that the information can be presented for a single time period or dynamically monitored over time. This allows a user to quickly define focus areas of interest for their analysis.

Two dashboards were designed: a general dashboard and a device comparison dashboard. The purpose of the general dashboard is to analyse and display information about the load profiles in each household electrical vehicle (EV) consumption. This information can be used in studies related to consumption habits, load forecasts, and demand estimates. The purpose of the device comparison dashboard is to determine consumption patterns in different inhabited areas, analyse the different loads used, and offer studies for the development and planning of electrical networks.

#### **4. System Integration**

The technologies used in each module of the system and their integration are described below.

#### *4.1. Automatic Data Collection*

The SMH used is based on the Arduino ANR3 and WD1m boards with local and cloud storage. The data capture is performed with voltage sensors *v* and current *i*, which send data through the analogue inputs of ANR3. Subsequently, the active power variables *p*, reactive *q*, and power factor *PF* are calculated. The data are then uploaded by WD1m to the cloud. Figure 5 shows the SMH block diagram.

**Figure 5.** Hardware block diagram of the SMH.

The SMH is powered from the mains using a dual 12 V output transformer. The supply signal is rectified to 12 V DC, which is within the range of the supply voltages (range 7–12 V DC) supported by ANR3 and WD1m. The voltage sensor ZMTP101b is connected to the 230 V mains and has an output of 5 V DC, which is supported by the analogue input of ANR3. The current sensor used, STC-013, has a measuring range of 30 A, which translates to 1 V DC. Since the analogue input of Arduino is 5 V, the voltage is increased from 1 V to 5 V DC by means of the digital/analogue converter ADS1115, the voltage is increased from 1 V to 5 V DC.

Microcontroller: ANR3 is developed on the basis of the ATmega328P microcontroller as an open-source platform for prototype development. It has a clock speed of 16 MHz, which allows measurements in very short time intervals (0.25 s).

Wireless communication WD1m uses the ESP8266 platform for Wi-Fi communication access, which works with the 802.11 b/g/n wireless system standard. This board ensures upload times to the cloud of less than 0.15 s, being less than the expected 0.25 s.

Current sensor: The current sensor (STC-013) used in this research is of the noninvasive type. This means that it does not modify the monitored electrical installation. The 30 A version was chosen, which can be used in households up to 6600 W. For higher currents, there is a 100 A version with a maximum power of 23,000 W. To increase the 1 V DC output of the current sensor to the 5 V analogue input of ANR3, an ADS1115 digital/analogue converter is used.

Voltage sensor: The voltage sensor used is the ZMPT101b voltage transformer. It switches from 230 V AC from the mains to 5 V DC from the analogue input of ANR3, directly obtaining the reading without the need for any intermediate equipment.

Datalogger shield: Due to the loss of data through the Internet connection, an SD card mounted on a datalogger shield is used as a backup, ensuring storage without loss of data. In this case, an 8 GB SD card was used, which allows data storage for 3.76 years.

Storage: From time to time, configured in the Arduino device, the data are sent to the central data system. This massive data storage system is Firebase, a platform for the development of web and mobile applications created by Google in 2014.

Specifically, the Firebase Realtime Database is a NoSQL database hosted in the cloud that provides great performance for this type of connection. For the volume of data produced by these models, SQL-based data queries are not efficient.

The PCB board for the SMH was designed and built. The board allows the connection of all the components used, which are soldered and therefore a solid and resistant final

system. The dimensions of the PCB board are 88 mm × 75 mm. Figure 6 shows the design of the PCB board front side and assembled with components of the SMH.

**Figure 6.** Printed circuit board (PCB) of the SMH with battery power supply: (**a**) front side: (**b**) back side and (**c**) assembled with real components.

In order to know whether the SMH is capable of mass production, it is necessary to make an economic assessment of the cost of the materials used. The budget also serves to validate the low-cost objective proposed earlier. The prices shown are from the manufacturers' official shops. As these are open-source components, compatible components can be found to further reduce the price. Table 2 shows the cost of the different components used.

**Table 2.** Cost of the components for a SMH with a battery power supply.


#### *4.2. Characteristics of Big Data*


### *4.3. Big Data Framework for Households*

The framework presented comprises the data life cycle of the proposed network. It can be seen from the data generation phase to the analysis phase. Figure 7 shows the framework that serves as the basis for dealing with the big data of the proposed network.

**Figure 7.** The framework to deal with electrical consumption in households with big data.

An integrated architecture based on big data and cloud computing is proposed. The system contains the following parts: the cloud environment, the big data tools and the database. Figure 1 shows the architecture of the proposed system.

#### 4.3.1. Data Generation

The generated data stream comes from SMs installed in houses and PV installations, measured every 0.5 s. The location of the monitored data is diverse (house, houses with PV, house with PV and EV, and houses with EV). It is possible to add the information of electrical variables generated with sensors that record meteorological variables. In addition, of the application to houses, commercial buildings, factories, and other renewable sources, etc., can be included.

### 4.3.2. Data Acquisition

The data acquisition of the designed platform has in three tasks: data collection, data transmission and data pre-processing. The generated data were explained in the previous point that are collected by the developed SM.

The transmission of the collected data is performed through one or more master nodes that are part of the Hadoop cluster. The collected data are sent to the data storage system where it is processed in the following phases.

Data integration uses techniques that aim to combine data from different SMs in order to unify the information. The files can be transferred in different formats, such as csv files, json files, etc.

The information generated by the SMs contains the time stamp, the ID of each house, voltage, current, active power, apparent power, and PF. In the data pre-processing phase, erroneous information is modified or removed to improve data quality.

The data acquisition system must fulfil the collection function. In this sense, it must collect, aggregate, and send large volumes of data from the SMs to the Hadoop master node. The data are stored in files within an HDFS repository in the formats used.

#### 4.3.3. Data Storage and Processing

HDFS performs the function of storing data for further processing. HDFS clusters consist of a NameNode with the responsibility of controlling the file system metadata. DataNode lists are used to store the actual data. Hadoop Yarn is the resource manager for data analysis. Yarn runs in conjunction with HDFS on the same node list. This allows processing of nodes with data that are part of the system.

#### 4.3.4. Data Query

Once the data are stored in Firebase, a Python script is developed, which is executed from time to time in an automatic and configurable way, by extracting, processing, and filtering all the information. This information is necessary for the next module, the system analysis, and visualization of the information. Subsequently, the information is uploaded to HDFS and the data can be consulted from Kibana.
