**Influence of Data Sampling Frequency on Household Consumption Load Profile Features: A Case Study in Spain**

### **J. C. Hernandez 1, F. Sanchez-Sutil 2, A. Cano-Ortega 2,\* and C. R. Baier <sup>3</sup>**


Received: 13 September 2020; Accepted: 20 October 2020; Published: 23 October 2020

**Abstract:** Smart meter (SM) deployment in the residential context provides a vast amount of data of high granularity at the individual household level. In this context, the choice of temporal resolution for describing household load profile features has a crucial impact on the results of any action or assessment. This study presents a methodology that makes two new contributions. Firstly, it proposes periodograms along with autocorrelation and partial autocorrelation analyses and an empirical distribution-based statistical analysis, which are able to describe household consumption profile features with greater accuracy. Secondly, it proposes a framework for data collection in households at a high sampling frequency. This methodology is able to analyze the influence of data granularity on the description of household consumption profile features. Its effectiveness was confirmed in a case study of four households in Spain. The results indicate that high-resolution data should be used to consider the full range of consumption load fluctuations. Nonetheless, the accuracy of these features was found to largely depend on the load profile analyzed. Indeed, in some households, accurate descriptions were obtained with coarse-grained data. In any case, an intermediate data-resolution of 5 s showed feature characterization closer to those of 0.5 s.

**Keywords:** smart meter; temporal data granularity; electric load profile; time slices; time series; advanced metering infrastructure

#### **1. Introduction**

Large temporal datasets for household electricity consumption, provided by smart meters (SMs), offer significant potential for energy time-series scientists. These datasets permit increased resolution and analysis at the level of individual households. Recent studies by Zhou and co-workers [1,2] focus on the challenges and opportunities that SMs provide for smarter energy management where SM data are an essential component.

Traditionally, consumption load metering in the residential context has been conducted at a low time resolution. Thus, consumption profiles are generally gathered for different dwelling types, based on a sampling frequency that provides a data granularity from 1 to 30 min [3]. However, current SCADA systems can sample consumption data at a higher frequency (typically 1 Hz) though standard practice is to store averaged values of 1 min or higher [4].

Reference [5] highlights the importance of taking into account both a wide time slice and frequency spectrum for an accurate description of the load profile features of household consumption.

More specifically, the choice of a temporal data granularity (data sampling frequency) for specifying consumption load profile features has a crucial impact on the results of any action or assessment,

as discussed in the literature [6–48], see Table 1. This table summarizes for each potential action or assessment the time resolution (data granularity) and time horizon (time slice) envisaged for the works related to load profiles in households. The impact of temporal data granularity is important because the consumption profile is known to fluctuate at a high temporal resolution (i.e., interval of 0.01–5 Hz [16,49,50]). Therefore, when a longer time resolution is envisaged, the profile dynamics become increasingly biased. This means that the profile should be sampled at a more fine-grained level, which will more accurately describe its behavior. Nonetheless, there is a trade-off between the computational burden and accuracy of any action or assessment, which is determined by this discrete time resolution [51]. Moreover, the effective measurement of the electrical variables –root-mean-square (RMS) value– requires at most a 5-Hz sampling frequency [52]; a higher frequency would provide just punctual values.

The dynamic nature and frequently high variation typical of household consumption load profiles in the residential context has been analyzed for different temporal data granularities, as shown in Table 1. As disclosed, most of the time resolutions were longer than 1 min. Few references focused on granularities lower than 15 s. High data granularity naturally implies larger amounts of data to be locally stored (either hard disk or memory card) or uploaded to the cloud. This has led many researchers and industry practitioners to develop and survey a vast number of analytical tools that could help to segment and cluster SM big data so that they can be analyzed in real time [11,32]. On the other hand, uploading these data to the cloud (data traffic with the cloud) is another important limitation [11,44,53]. High granularity requires a large bandwidth, which is not always available in individual households. Since data cannot be transmitted at such high resolutions, data compression algorithms are required. The compression ratio can be 10:1 or even 1000:1 can be achieved. For example, Kelly [8] performed measurements every 6.25 <sup>×</sup> 10−<sup>5</sup> s(16 kHz) and used the free lossless audio codec (FLAC) compression algorithm that reduced the daily data of 28.8 GB to 4.8 GB with a 6:1 compression ratio. Also, reference [54] developed a new method of data compression via stacked convolutional sparse auto-encoder. These algorithms are usually time consuming, and thus, the gathered data are not available in real time.

The scientific community currently has very limited access to consumption load data in the residential context. The information available for private purposes is either free or must be purchased. Free options are available at different web sites that provide records for homes. Pecan Street [55] is a web site that provides data from 1115 houses with PV and/or EV. Mack [56] developed a web site on SMs for homes, with the aim of saving electricity. Wilcox [53] built a hadoop-scaled SM analytics platform that allows the use of large datasets at a 20 TB scale. Furthermore, there are other web sites that provide household consumption load data with a granularity from 1 s to 1 min [57–64].

Some studies evaluated the feature bias due to the use of coarse-grained data when assessing the consumption load profiles in households. Murray [38] compared time resolution data of 1 min and 15 min and demonstrated the damping effect when working with data with high temporal resolutions in 21 houses in the UK. Naspolini [39] showed that the use of a 15-min data granularity was not well adjusted because of load fluctuations with a period lower than 15 min (5 min) in the operation of electric water heaters. Bucher [40] studied a 1 s and 15-min data granularity in domestic PV-household load profiles. Shi [41] analyzed the accuracy of predictions based on different data granularity. Hoevenaars [64] showed that using a 1-h time step hid the load variability within the hour for models of renewable power systems. Regarding optimization purposes, Van der Meer [42] concluded that a 5-min time resolution provided a good balance between accuracy and the burden of data size, whereas [45] showed that using hourly data led to large biases compared to 1-min data. However, coarser data could be sufficient for household aggregation [48]. A shorter fluctuation at a granularity of only 4 s was investigated in [65].



**Table** 

In this context, it was found that the availability of household consumption load data at a fine granularity (high sampling frequency) obtained from SM measurements allows users to do the following actions or assessments (see Table 1):


To date, the major obstacle to accurately describing household consumption load profile features is the fact that this type of profile has not been sampled at temporal low-resolution (high frequency) that gathers its intrinsic dynamics. Thus, most of the examples mentioned in Table 1 used an hourly or even a 30-min or 15-min time resolution, but they do not include enough information to accurately provide actions or assessments. Furthermore, even though the current trend is to use more temporally granular data sets in household applications, the influence of temporal granularity has not as yet been analyzed using a comprehensive and high-resolution data set. Quite a few papers [38–42,45,48,65] evaluated the bias due to the use of coarse-grained data, but never compared resolutions lower than 1 to 5 min. Still another shortcoming is the fact that most studies cover short time slices. These involve a reduced timespan chosen to characterize key aspects of temporal variability, for example, covering weekdays and weekends, different times of day, and different seasons. The time horizon (time slice) envisaged in Table 1 was typically restricted to minutes, a few hours or a few days. However, the resulting description is not accurate since it does not take seasonality into account. Lastly, SM data only reflect a few electrical variables, which means that very little information regarding electrical behavior can be derived (usually energy or power).

To fill this gap, the new methodology presented in this paper makes two major contributions. It first proposes periodograms along with autocorrelation and partial autocorrelation analyses and an empirical distribution-based statistical analysis, which are able to describe household consumption profile features with greater accuracy. This type of analysis reveals key issues about the granularity impact on the load fluctuation, such as the accurate description of its constituent signals. In contrast, the temporal analysis usually found in literature only offers information regarding the granularity impact on the change in the magnitude of the peak and trough load. Secondly, it proposes a framework for data collection in households at a high sampling frequency (>4 Hz) that provides data to be used in the proposed methodology.

A case study of four households in Spain, using thirteen data granularities, from a half-second to 30 min (0.5, 1, 2, 5, 10, 15, 30 s, and 1, 2, 5, 10, 15, and 30 min), provided valuable insights into the influence of data granularity on the description of consumption load profile features. The data set selected, during almost two years, had different consumption features, namely with varying characteristics in terms of the relation between the peak and base load and load fluctuations, which made it possible to take the heterogeneity of real-world load profiles into account. We acknowledge that conducting our analysis with a data sample from four households is a limitation of this study. However, this data sample was adequate to achieve our primary objective of demonstrating the usefulness of the methodology proposed, which was to highlight the information loss regarding the profile features when using coarse-grained data.

The remainder of the paper is organized as follows: Section 2 describes the methodology implemented in this study. Section 3 discusses the results that reflect the influence of data granularity (sampling frequency) on consumption load profile features. Finally, Section 4 presents the conclusions that can be derived from this research.

#### **2. Methodology**

This section first presents a set of variables for a full description of features for the stochastic dataset derived from household consumption load profiles. The variables are defined from estimated time series models. This is followed by an explanation of the concepts of granularity and time slices. An outline is then provided of the framework for consumption data collection in households at high sampling frequency and its post-processing.

#### *2.1. Time-Series Theory*

For stationary stochastic data, the theory of time series models provides estimated models, which include the description of the probability mass function (PMF), power spectral density function, and autocorrelation function [66,67].

The features of a stationary stochastic dataset are fully described by the joint probability density function of the observations [66,67]. The joint probability density function of the observations are fully depicted by a stationary stochastic dataset. If the density could be calculated on the basis of observations, then this density would provide all of the information pertaining to the signal. Nevertheless, this is usually not feasible without a great deal of additional knowledge about how such observations were obtained. Features that can always be estimated include the power spectral density and the autocorrelation function. In addition, knowledge of the spectrum (power spectral density function) or autocorrelation (autocorrelation function) along with the first two statistical moments makes it possible to accurately describe the joint probability density function of normally distributed observations [66,67]. Even when assumptions regarding the normal distribution and strict stationarity are not confirmed, previous estimators still provide a sound basis for further research [67]. Nonetheless, in the case of other distributions, higher-order moments provide more information.

#### 2.1.1. Stationary

A time series *x* (and thus the underlying stochastic process) is considered stationary if the process is in a certain state of statistical equilibrium. Accordingly, the properties of a stochastic processes are assumed to be invariant during the translation trough time. This signifies that the joint probability distribution associated with *m* observations (*x*1, *x*2, ... , *xm*), for any set of time measurements (*t*1, *t*2, ... , *tm*), is the same as that for *m* observations (*x*1+*k*, *x*2+*k*, ... , *xm*<sup>+</sup>*k*), at times (*t*1+*k*, *t*2+*k*, ... , *tm*<sup>+</sup>*k*). Therefore, the joint distribution must not change when all of the observation times are shifted backward or forward by any integer amount *k*.

Household consumption load profiles are usually not stationary; there is usually daily, weekly, and monthly seasonality and an upward trend as the number of appliances in the household rises. However, as stationary datasets are easier to analyze, there are numerous techniques that can be applied on time series to make it stationary, i.e., transformations, deseasonalisation, and differencing [68].

There are many methods to check whether a time series is stationary or non-stationary: (i) look at plots; (ii) summary statistics; and (iii) statistical tests. The most rigorous approach to detecting stationarity in time series data is using statistical tests developed to detect specific types of stationarity, such as simple parametric models that generate the stochastic process. Among them, it is important to mention the following: (i) Augmented Dickey-Fuller (ADF) test [69]; (ii) Kwiatkowski, Phillips, Schmidt, and Shin (KPSS) test [70,71]; (iii) Variance ratio test [72]; (iv) Leybourne-McCabe (LMC) test [73]; and (v) Phillips-Perron (PP) test [74].

The Dickey-Fuller test was the first statistical test developed to check the null hypothesis that a unit root is present in an autoregressive model of a given time series, and that the process is thus not stationary. An extension of this test (the ADF test) was developed to accommodate more complex models and data. The KPSS test assumes the null hypothesis of stationarity in relation to the linear or average trend, as opposed to the ADF test. While the alternative is the presence of unitary root. The unitary hypothesis represents the stationary nature of the process. The variance ratio test is included in the semi-parametric tests, unlike the two previous ones, which are parametric. The null hypothesis implies non-stationary, and the alternative hypothesis indicates that the process is stationary. El LCM test allows for additional autoregressive lags similar to the ADF test. Although both tests have the same asymptotic distribution, the statistics from the LMC test converge at a higher rate. The PP test states the null hypothesis as the non-stationarity in time series data and the rejection of the unit-root null in favor of the alternative model.

#### 2.1.2. Metrics for Statistics and Probability Analysis

In probability theory, the moments of a stochastic dataset (random variable) *x* consist of the expected values of certain functions *x*. Such variables are a set of descriptive measurements that correspond to the probability distribution of *x* and determine whether all the moments of *x* are known.

Let *x* be a discrete univariate dataset with a finite number of outcomes (*x*1, *x*2, ... , *xm*) occurring with probabilities *px*(*xi* ), i.e., with PMF *px*(*xi* )(= *P*(*x* = *xi*)]), its moment of order one, two, and *r* can be specified as follows [75,76]:

$$\begin{aligned} m\_x^1 &= E[\mathbf{x}]; \ m\_x^2 = E[\mathbf{x}^2]; \ m\_x^r = E[\mathbf{x}^r] \\ m\_x^r &= \sum\_{i=1}^{\infty} \mathbf{x}\_i^r \cdot p\_x(\mathbf{x}\_i) \end{aligned} \tag{1}$$

The cumulants of a stochastic dataset are variables that constitute an alternative to the moments described in the previous paragraph [76]. Unlike moments, these cumulants cannot be directly obtained by summatory or integrative processes, such as (1). Cumulants can only be found by identifying the moments and applying relationship formulas [75,76]. Accordingly, the first cumulant is the expected value; the second cumulant is the variance; the third cumulants measures asymmetry; and the fourth cumulant measures the tailedness of the probability distribution.

Equation (1) describes how to find the expected value, variance, skewness and kurtosis for discrete random variables according to probability theory. However, some of these variables such as the expected value and variance might strike as very similar to the sample mean and sample variance, respectively, in descriptive statistics. The sample mean and sample variance are random variables because their values depend on what the particular random sample happens to be. In other words, if we know the frequency distribution, or how many times a data value is repeated in the dataset, the following formula can be used to determine the statistics sample mean, *x*, and sample variance, *s*2:

$$\begin{aligned} \overline{\mathfrak{x}} &= \frac{\sum f \cdot \mathbf{x}\_i}{\sum f} \\ \mathbf{s}^2 &= \frac{\sum f \cdot (\mathbf{x}\_i - \overline{\mathfrak{x}})^2}{\sum f} \end{aligned} \tag{2}$$

Notice that *m* = *f* is the dataset sample size.

The distinction between variables in statistics and probability analysis is that the statistics vary with each sample dataset, whereas probability variables are fixed when you know the dataset's probability distribution. The law of large numbers states that as the sample size grows to infinity, statistics provide a more accurate picture of moments of distribution.

#### 2.1.3. Autocorrelation and Spectrum

The covariance between two observations *xn* and *xn*<sup>+</sup>*<sup>k</sup>* of a stationary stochastic dataset is formulated as follows:

$$r(k) = \text{cov}(\mathbf{x}\_{\text{lt}}, \mathbf{x}\_{\text{n}+k}) = E[(\mathbf{x}\_{\text{lt}} - \mu\_{\text{x}})(\mathbf{x}\_{\text{n}+k} - \mu\_{\text{x}})] \tag{3}$$

where μ*<sup>x</sup>* is the mean of the dataset.

The quantity *r*(*k*) is specified for each integral value of *k*, and the combination of all of these quantities is known as the autocovariance function of *xn*. It quantifies the covariance between pairs at a distance or lag *k*, for all values of *k*. This signifies that it is a function of lag *k*.

The autocovariance function expresses all knowledge pertaining to Gaussian stochastic data. In combination with the first two statistical moments, it fully characterizes the joint probability distribution function of the data. Only when the distribution significantly departs from normal, it is interesting to study higher-order moments or other features.

In the same way as the covariance between two variables, it is also possible to normalize the autocovariance function *r*(*k*) and thus obtain the autocorrelation function ρ(*k*):

$$\rho(k) = \frac{r(k)}{r(0)} = \frac{r(k)}{\sigma\_x^2} \tag{4}$$

where σ<sup>2</sup> *<sup>x</sup>* is the variance of the dataset.

The autocorrelation function reveals how rapidly a signal can change over a period of time. At lag 0, the autocorrelation value is 1. In the case of most physical processes, there is an autocorrelation function that progressively diminishes for greater lags. Accordingly, the relation at a short temporal distance is greater than the relation for longer distances. A long lag value in the autocovariance function is indicative of the slow variation of the data. In contrast, a short lag value signifies that the data at short distances are correlated. Nonetheless, a high value in the autocorrelation function signifies a repetition pattern, and thus reveals a constituent signal in the analyzed dataset. Therefore, the resulting variable that includes the set of high values of the autocorrelation function can provide a means for describing the features of a stationary stochastic dataset.

The partial autocorrelation function, α(*k*) represents the autocorrelation between *xn* and *xn*<sup>+</sup>*<sup>k</sup>* is an indication of *xn* on *xn*<sup>+</sup>*<sup>k</sup>* through *xn*<sup>+</sup>*k*−<sup>1</sup> removed. In the same way, it denotes the autocorrelation between *xn* and *xn*<sup>+</sup>*k*, which is not explained by lags 1 through *k-1*, inclusively [77]:

$$\begin{aligned} a(1) &= \operatorname{corr}[\mathbf{x}\_{n+1}, \mathbf{x}\_n] \text{ for } k = 1\\ a(k) &= \operatorname{corr}[\mathbf{x}\_{n+k} - P\_{n,k}(z\_{n+k}), \mathbf{x}\_n - P\_{n,k}(\mathbf{x}\_n)] \text{ for } k \ge 2 \end{aligned} \tag{5}$$

where *Pn*,*k*(*x*) is the surjective operator of orthogonal projection of *x* onto the linear subspace of the Hilbert space spanned by *xn*+1, ... , *xn*<sup>+</sup>*k*−1.

There are algorithms for estimating the partial autocorrelation based on the sample autocorrelations [78].

The Discrete Fourier Transform (DFT) of the autocovariance function constitutes the spectrum or power spectral density function *h*(ω). The Wiener-Khintchine theorem [79,80] defines conditions in which valid autocovariances have a transform that is always non-negative in all contexts; see [81].

$$\begin{array}{ll} h(\omega) = \frac{1}{2\pi} \sum\_{k=-\infty}^{\infty} r(k) e^{-j\omega k}, & -\pi \le \omega \le \pi\\ r(k) = \int\_{-\pi}^{\pi} h(\omega) e^{j\omega k} d\omega, & k = 0, \pm 1, \pm 2, \dots \end{array} \tag{6}$$

The reason why this is known as the 'power spectral density function' is evident in the integral for the value *k* = 0:

$$r(0) = \int\_{-\pi}^{\pi} h(\omega) d\omega = r(0) = \sigma\_x^2 \tag{7}$$

The variance represents the total power in the signal. It is the power spectral density function that provides the distribution of the total power over the frequency range. When the data are characterized by a strong quasi-periodicity with a specific period, they show a narrow peak in the power spectral density instead of one exact frequency, thus revealing a constituent signal in the analyzed dataset. Therefore, the resulting variable that includes the set of narrow peaks of the power spectral density function can provide a means for describing the features of a stationary stochastic dataset.

The fast Fourier transform (FFT) is an algorithm used for computing the DFT more efficiently and faster, described in Equation (6) as an infinite sum. The FFT algorithm systematizes the redundant calculations in a very efficient way, taking advantage of the algebraic properties of the Fourier matrix. The FFT applied to a signal allows thus obtaining its power spectrum, namely, the periodogram of the signal. When a high number of observations (*N*) is involved, the FFT results assure a high accuracy.

When (4) is divided by the variance of the signal, this results in the normalized autocorrelation function ρ(*k*) and the normalized power spectral density ϕ(ω):

$$\begin{array}{ll}\varphi(\boldsymbol{\alpha}) = \frac{1}{2\pi} \sum\_{k=-\infty}^{\infty} \rho(k) e^{-j\omega k}, & -\pi \le \boldsymbol{\alpha} \le \pi \\\ \rho(k) = \int\_{-\pi}^{\pi} \rho(\boldsymbol{\alpha}) e^{j\boldsymbol{\alpha}k} d\boldsymbol{\alpha}, & k = \, 0, \pm 1, \pm 2, \ldots \end{array} \tag{8}$$

#### *2.2. Temporal Granularity and Time Slices*

Granularity is the temporal resolution of a recorded measurement set, which is used to obtain the variability of the data set, pertaining to a given measurement interval. The time slice is the temporal framework for a given study. The quality of the results depends on the choice of the appropriate granularity and time slice for establishing the household consumption load profile features.

The temporal framework for analyzing the electrical energy system differs, depending on the action or assessment performed [19], as shown in Figure 1. Very short-term analysis involves a temporal framework from 0.5 s to 1 min, and mainly includes transient analysis, demand response in real time, angle stability, and frequency control. Short-term analysis is associated with the system operation from various seconds to thirty minutes. It includes day-to-day system operation, hour-ahead scheduling, studies of probabilistic load forecasting, and seasonal prediction of demand. A mid-term analysis involves a temporal framework from various days to one year. This includes maintenance of system assets, unit commitment, energy trading, and energy sales. Finally, long-term analyses, which range from 3 years to more than a decade [19], cover new capacity addition [82], system planning, and energy policies covering future demand growth.

**Figure 1.** Temporal framework for analyzing the electrical energy system.

#### *2.3. Planned Framework for Consumption Data Collection in Households*

Figure 2 shows the planned framework for the remote real-time collection of consumption data at high sampling frequency (>4 Hz) in households with SMs and the uploading of this information to the cloud. This development is one of the most important issues targeted by the SEREDIS project ('Nuevos servicios de red para microredes renovables inteligentes. Contribución a la generación distribuida residencial': Grant No. ENE 2017-83860-R [44,49,83,84]). In this framework, the SM designed in [44] is installed in the general protection box, and the data gathered are dumped directly into two data storage solutions: (i) the cloud and (ii) a local storage in SD card.

#### 2.3.1. SM

As shown in Figure 3, the SM is composed of a data acquisition block and a data-to-cloud upload block (see [44] for a more in-depth explanation). This SM was calibrated and tested to ensure its reliability and accuracy [44]. This SM has two Arduino boards: (i) the Arduino Uno Rev3 (AUR3 [85]); (ii) the Wemos Arduino D1R1 (AD1R1 [86]). The AUR3 board was used for the measurement process whereas the AD1R1 board uploaded data to the cloud. This reduced the time needed to process data and upload them to the cloud.

The SM simultaneously performed two processes. In the first process, the AUR3 microcontroller software determined the fundamental and derived electrical variables, sent data via serial port to AD1R1, and then stored information in the local data logger. The AD1R1 software read from the serial port and uploaded the data to the cloud via Wi-Fi in a parallel process.

**Figure 2.** Remote real-time data collection schematic.

**Figure 3.** Block diagram for the data acquisition and uploading.

The timeline of the processes is shown in Figure 4. In the first process, the required time for measuring electrical variables is a 10-cycle time interval for a 50-Hz power system (Class-A performance [52]). In order to be accurately measured, the frequency of sampling must be at least twice the frequency of the signal. Therefore, 200 samples were used for the 10-cycle interval with a sampling frequency of 1 kHz. Derived variables are then calculated, which takes about 30 ms. The transmission of the information to the serial part only takes 1 ms, and data storage in SD memory, 9 ms. This leaves 10 ms for the waiting time In the second process, the SM reads the data received in 1 ms and uploads the data in 150 ms. About 50 ms are required to confirm the data upload, which leaves 49 ms for the waiting time.

**Figure 4.** Process timeline for the SM.

#### Data Acquisition

Figure 3 shows the hardware for the data acquisition process. Analogical voltage and current sensors measured the electrical variables, which were then processed in the Arduino AUR3 [85] microprocessor. Once the fundamental variables were obtained, the derived variables were computed. More specifically, the current sensor STC-013 [87] (of the non-invasive type) together with the voltage sensor ZMPT101b were used for this purpose [88]. A digital/analogue converter ADS1115 was planned to increase the 1V DC output of the current sensor to the 5 V analogue input of AUR3 [89].

#### Data-To-Cloud Upload

The cloud provides a cost-effective method of supporting big data analytics. Therefore, the cloud data storage solution is suitable in scenarios where a real-time response from a given stream of SM data is required. This real-time data availability aids in personalizing applications that benefit both household owners and the scientific community when analyzing consumer profiles.

When data with finer granularities are gathered, the amount of information involved is high and requires data compression algorithms. These algorithms are usually time consuming; thus, a delay between the measurement and the data availability in the cloud appears. To enable a real-time response, this research does not apply compression algorithms, and the time of data-to-cloud upload is set to 0.25 s (see Figure 4), the same as the data acquisition time.

According to Figure 3, the base of the wireless communication module of the SM was the AD1R1 board [88] that acted as the interface between the microcontroller and cloud data storage (i.e., Firebase). The board used the ESP8266 platform as the operation core, which permitted Wired Equivalent Privacy (WEP) or Wi-Fi Protected Access (WPA)/WPA2 () authentication for secure Wi-Fi communication. In addition, it operated with 802.11 b/g/n wireless systems, which were compatible with the majority of the routers and modems on the market. This framework used the platform Firebase [90] to store huge amounts of data from households monitored with IoT technology and cloud computing. Alternatively, wireless communication systems such as 4G and 5G networks can be used. This implies a more expensive data service contract for data-to-cloud uploading.

#### Local Data Storage

A SM is equipped with a SD card mounted on a data logger shield, which is used as a backup to avoid data lost because of data-to-cloud upload problems. The memory size required per household in a year is about 2.2 GB.

#### *2.4. Data Post-Processing*

In an asynchronous way, data collected can be used for different actions or assessments. In our study, the assessment aims to show the influence of data granularity on the description of consumption load profile features. This required post-processing the data that involved the extraction of data from the planned storage solution and its adaptation for different granularities.

Cloud data were stored in *json* format and could be downloaded at any time. Also, SD card data were stored in *CSV* format and could be downloaded anytime. Therefore, firstly the data of each house in Figure 2 could be downloaded and converted to *CSV* format, if required, resulting in a daily *CSV* file of all electrical measurements. This format could be recognized by applications used for the processing and analysis of data, such as MS Excel and MatLab.

Secondly, the adaptation for different granularities from raw data on a 0.25 s-basis was carried out by the up-sampling method of the RMS value. The data size was reduced in the up-sampling operation. This allowed obtaining data at thirteen resolutions of data granularity, from a half-second to 30 min (0.5, 1, 2, 5, 10, 15, 30 s, and 1, 2, 5, 10, 15, and 30 min).

#### **3. Results and Discussion**

This section shows the results of the influence of data granularity on the description of household consumption load profile features based on the methodology presented in Section 2. This framework highlighted the information loss regarding the profile features when coarse-grained data were used. We first focus on the temporal results for different time slices from a sub-hourly to a monthly analysis, including daily and weekly analyses. However, the global influence assessment was based on a yearly analysis. This first involved a statistical analysis. After this, periodograms were compared to autocorrelation and partial autocorrelation analyses to highlight significant outcomes regarding profile features.

#### *3.1. Case Study*

The case study focused on four real-world households in the city of Jaen in southern Spain. As explained, this research is part of the SEREDIS project, which characterized load profiles for household consumption, electric vehicles (EVs), and PV systems in the residential context. Nonetheless, this study is limited to the consumption load in households. The consumption load data came from SM readings as described in Section 2.3, being post-processed as stated in Section 2.4.

The household set selected had different consumption features, namely with varying characteristics in terms of the relation between the peak and base load and load fluctuations, which made it possible to take the heterogeneity of real-world load profiles into account (see Table 2). In addition, important issues determined the household selection, such as the power contracted from the electric mains and the type of supply at home. Single-phase systems were designed at households #1, #2, and #4, whereas household #3 had a three-phase system. The contracted power in Spanish legislation reflects how the household is equipped with different electrical appliances (see Table 3). In this study, the limitation in the number of households was due to a combination of limited financial budget, limited number of households that included EVs and PV systems, and a low number of families who voluntarily cooperated on the research project.

In particular, household #1 was a family flat with three children. Household #2 was a semi-detached house with only two inhabitants. Household #3 was a detached house with two children and their parents. Household #4 was a terraced house of four components, inhabited by two adults and two teenagers.

The city of Jaén has a Mediterranean climate with hot summers but cool winters. Therefore, the houses in our study were all equipped with climate control systems. More specifically, household #1 had a heating system for the whole building with a gas boiler. The flat also had an air-conditioning system for summer. Household #2 had an individual gas boiler for heating, and a two-split air conditioner system (living room and master bedroom). Household #3 was equipped with a central

electrical air-thermal system for heating and cooling. Household #4 had an air-conditioning/heat pump system. Each household had different electrical appliances, depending on the age and behavior of the family members, see Table 3.


**Table 2.** Key features of four households in Jaén (southern Spain). Case study.



All of the households had the usual appliances installed in the kitchen. Household #2 did not have a dishwasher. The computer equipment in each household was quite heterogeneous (desktop computer, laptop, smart phone, tablet), and depended on the profession of the occupants, and whether certain members of the household were still students. Moreover, the ages of the occupants also determined the entertainment equipment (TV, stereo system, video games console, etc.). Thus, households #1, #3 and #4 had video game consoles for the children and adolescents that lived there. In contrast, the occupants of household #2 preferred listening to music on a CD/disc or playing movies on DVD/BlueRay. The lighting system in all households was low consumption and high efficiency. Low consumption lamps were planned in households #1, #2 and #3. Households #1, #2 and #4 also had fluorescent lamps. LED lighting was present in households #1, #3 and #4. Only household #4 used halogen lamps.

#### *3.2. Reliability of the Planned Framework to Provide Data*

Figure 5 shows the data availability for the data-to-cloud upload in each household within the SEREDIS project up to July 2020. The gaps in the figure represent the days when some information was lost because of data-to-cloud upload problems, particularly when the data loss exceeded 25% in one day. Data collection started in July 2018 (household #3) and is still going on for all of the households. On September 2018, SMs were added to households #1 and #2. Finally, household #4 was equipped with an SM in October 2018.

**Figure 5.** Data availability for every household.

As data were collected on a 0.25 s basis, 345,600 measurements for six electrical variables (voltage, current, active, reactive and apparent power, power factor) were stored each day. This provided a total data set of 2,073,600 per day, 62,208,000 per month, and 746,496,000 per year.

The reliability of the data-to-cloud upload (Firebase [90]) during 2019 was greater than 99% because of the high quality of the fiber optic Internet connection in all of the households. Figure 6 depicts the percentages of successful data-to-cloud upload.

**Figure 6.** Percentage of successful data-to-cloud upload in 2019.

As explained, the SM also included a local 8-GB memory card that served as a data backup. This permitted an operational autonomy of 1.88 years. This local storage guaranteed 100% data availability; thus, the assessment shown hereafter is based on data of this storage solution.

#### *3.3. Sub-Hourly Time Slice Analysis*

The influence of the data granularity on the temporal change features of the household consumption load profile at sub-hourly level is analyzed in this section. Figure 7 shows the measurements for all of the households and thirteen resolutions of data granularity from 0.5 s to 30 min on a working day in September. Data with finer granularities, from 0.5 s to 30 s, are shown on a 5-min timespan whereas coarser resolutions, from 0.5 min to 30 min, cover a 2-h timespan.

For household #1, Figure 7a1 highlights the smoothing of peaks and troughs in the consumption load because of the use of coarse-grained data. For example, the 1.77 kW load peak at 22:26:26 h was only achieved at a 0.5-s granularity. The granularity of 1 and 2 s gave a load peak of 1.67 kW and 1.61 kW, which meant a decrease of 5.64% and 9.04%, respectively. However, the flattening of this peak was very pronounced for the 30-min granularity where a decrease of 27.12% was observed.

Regarding household #2, the increase in the maximum load peak for the finest granularity in regard to the coarse resolution occurred at 23:42:52 h and was 56.44%. The comparison of the consumption load in household #3 at 12:07:00 h, as an example, shows a reading of 1.83 kW for a 0.5-s granularity, 0.53 kW for a 15 min granularity and 0.81 kW for a 30 min granularity, which meant a decrease of 71.03% and 55.73%, respectively. For household #4, the reading of 1.81 kW for a 0.5-s granularity at 14:30:00 h was reduced to 0.88 kW for the 30-min granularity, which meant a decrease of 51.38%.

Coarse data granularities tend to flatten the peaks and troughs in the consumption load. Since there is a considerable loss of information, the consumption thus evaluated does not conform to reality. The smaller the granularity used, the smoother the load peaks and troughs will be. As a result, the reduction in the measured power is greater, and is thus a less accurate reflection of reality. One problem with reducing the temporal resolution of consumption data sampling is the loss of variability in regard to the intra-temporal steps, which has a particularly high effect on the actions or assessments. The loss of detail observed in Figure 7, does not justify the reduction of the sampling resolution for the consumption load data.

#### *3.4. Daily Time Slice Analysis*

This section extends the timeframe of Section 3.3 to one day when analyzing the influence of the data granularity. This provides information pertaining to the daily consumption in the households. Figure 8 shows the six samples of data granularity for all of the households on the previously mentioned day of September. This figure highlights the dual nature of the consumption load profile, namely, the continuity of the rough base-load and the intermittent spikes of the peak-load.

For household #1, the maximum daily load peak occurred at 22:00:00 h with a 0.5-s reading of 4.13 kW. However, the readings for the 10 and 30-min granularity dropped to 2.65 and 1.69 kW, respectively, which signified a decrease of 36.07% and 59.09%. As can be observed, the consumption load for household #2 had a sequence of peaks throughout the day for the finest data granularity. These peaks were strongly attenuated for a 5-s data granularity, and much more reduced for the coarse data resolutions. These repeated peaks were caused by the operation of the refrigerator. As an example, the 0.5-s reading at 23:49:53 h was 2.27 kW, after which it dropped to 0.95 kW (a 58.14% decrease) for the 30-min granularity. The central electrical air-thermal system in household #3 originated peaks at a stable timespan. The maximum daily load peak occurred at 19:23:51, with a reading of 7.47 kW for the 0.5-s granularity. In contrast, the 30-min granularity reading dropped to 5.95 kW, a value that was 20.34% lower. In household #4 the maximum daily load peak occurred at 9:17:23 h with a value of 3.14 kW for a 0.5-s granularity, whereas for a 30-min granularity, it decreased by 57.64% (1.33 kW). Repeated peaks were only identified at 0.5 and 5-s granularities during hours when the consumption was lower. These were caused by equipment disconnection. In summary, the greatest loss of information occurred in household #1 (59.07%), followed by #2 (58.14%), #4 (57.64%) and #3 (20.34%).

**Figure 7.** Consumption load profile for household #1 to #4: (**a1**–**a4**) 0.5 to 30 s granularities (5-min timespan); (**b1**–**b4**) 0.5 to 30-min granularities (2-h timespan).

**Figure 8.** Daily consumption load profile for household #1 to #4: (**a1**–**a4**) 0.5 to 30 s granularities; (**b1**–**b4**) 2 to 30-min granularities.

#### *3.5. Weekly Time Slice Analysis*

This section discusses the influence of the three samples of data granularity on weekly consumption features, Figure 9. For household #1, the maximum load peaks occurred on Thursday and Friday, with 5.48 kW for the 0.5-s data granularity. These peaks were reduced by 12.77% and 34.67% at the granularities of 30 s and 30 min, respectively. Repeated peaks throughout the day in household #2 were also observed during the weekly analysis. The reduction of peaks by using coarse data granularity, (up to 75%), was the most pronounced in the households analyzed. Intermediate data granularity decreased the weekly load peaks by 37%. In household #3, the highest reduction occurred at the end of Wednesday, which signified a reduction of 30.62% and 72.76% in the temporal resolution of 30 s and 30 min, respectively. The reduction in household #4 was lower, and came to a 62.07% droop for the 30-min data granularity.

**Figure 9.** Weekly consumption load profile in households #1 to #4 for three data granularities.

#### *3.6. Monthly Time Slice Analysis*

This section examines the daily smoothing of the highest peak and deepest trough when using coarse-grained data to underline their accuracy. The analysis focused on data from January.

Unlike the remaining sections where the temporal framework and the analysis were very short-term, the results of this section are applicable to medium-term analysis where knowledge of day-to-day operation is required. For this purpose, the ratio between the daily peak or trough load and the daily mean load is used as a metric.

Figure 10 shows the ratio for the peak load and all of the households for a 0.5–30 min data granularity. The load profiles show a widening spread in the daily mean load for households #3, #1, #2, and #4. This reveals an increase in daily variability as shown in Figure 9.

**Figure 10.** Ratio of daily peak load vs. daily average load in households #1 to #4 for 1.66–30 min data granularity in January.

Table 4 summarizes the highest peak-mean and deepest trough-mean ratios for the thirteen granularities and four households. As can be observed, the monthly maximum peak-mean ratios achieved the highest values for a data granularity of 0.5 s. The ratio decrease for the 30-min data granularity was 50.02%, 39.66%, 35.07% and 36.89% with regard to the 0.5-s granularity, respectively for households #1, #2, #3, and #4. The monthly minimum trough-mean ratio also achieved the highest value for a 0.5-s data granularity. The decrease in percentages with coarser granularity, which was greater than the one for maximum peak-mean ratios, was the following: 64.07%, 75.22%, 27.67% and 59.64%. This lack of accuracy indicates the need to adjust the data granularity at 0.5 s for the medium-term analysis.


**Table 4.** Monthly maximum/minimum ratio of daily peak-mean load and of daily trough-mean load in January.

#### *3.7. Yearly Time Slice Analysis*

This section highlights the influence of data granularity on the description of household consumption load profile features by means of different complementary analyses. Firstly, the consumption pattern of each household is justified. Then, based on the use of coarse-grained data, a statistical analysis underlines the change in the annual empirical distribution shape. Finally, periodograms and autocorrelation analyses are used to focus on the loss of information pertaining to profile features, caused by the use of coarse-grained data. This was based on the knowledge of the main constituent signals of the load fluctuations.

#### 3.7.1. Temporal Consumption Pattern

The yearly consumption pattern in a household reflects the energy behavior of the occupants during all seasons, and is strongly influenced by temperature, wind speed, relative humidity, etc. [16,34]. It also takes vacation and holiday periods into account. Accordingly, Figure 11 shows the daily average consumption load during the year for five samples of data granularity in all of the households.

**Figure 11.** Daily average consumption load profile in households #1 to #4 for five data granularities.

Consumption pattern #1 was stable during spring, autumn, and winter. The heating system in the household was for the whole building, and thus did not influence electricity consumption. However, the air-conditioning system from May to September strongly increased consumption, except for the month of August when the occupants were away on holiday. During the summer (June to September), the children spent more time at home, which increased consumption.

Consumption pattern #2 was stable throughout the year because the occupants were at work all day, and were only at home at night. Nonetheless, there were days between June and October when consumption peaked because of the use of the split air conditioner system.

The central electrical air-thermal system for heating and cooling in household #3 operated the whole year. When temperatures were lower (i.e., January to April), consumption was greater. Regarding household #4, electricity consumption decreased in January and August because the family moved to their second residence.

#### 3.7.2. Statistical Analysis

This section presents a statistical analysis of the datasets monitored in the course of a year (see Section 3.2 for a detailed explanation of the timespan) for the four households in the case study (see Section 3.1), once these datasets were post-processed for different granularities according to Section 2.4. This analysis shows the impact of data granularity on the description of household consumption load profile features. Accordingly, Figure 12 represents the annual empirical distributions (PMFs of the discrete variables) of the consumption load for all of the households and for four data granularities. Specific zooms were included for a better understanding of results.

**Figure 12.** Annual empirical distribution of the consumption load profile in households #1 to #4 for four data granularities.

As can be observed, the PMFs of the household consumption load data are clearly not normal or Gaussian. The most non-Gaussian behavior is evident for household #1, followed by household #3. This result is in consonance with the dual nature of the consumption load profile in Figure 8. In addition, the use of coarser temporal granularity, ranging from 0.5 s to 30 min, substantially affected the PMF shapes. Thus, opposite behaviors were observed. The PMF shape either moved further away from a Gaussian distribution (households #1 and #3) or began to show a more Gaussian behavior (households #2 and #4). In general, the shape was more frequently skewed near those hours with a lower load, which removed many of the extremes. The extreme ends of the PMF were of potential interest as they represented periods of very low or very high consumption.

For profile #1, the PMF moved largely within the intervals of 0.0–0.4 kW and 0.6–3.0 kW. A reduction of occurrences in the 0.0–0.4 kW interval was observed, which moved the higher occurrences towards the 0.6–1.5 kW load interval. In household #3, the lower occurrences in the 0.0–2.5 kW interval were compensated by higher ones in the 2.25–3.20 kW interval. For household #4, a displacement of occurrences from the 0.30–0.55 kW interval to the 0.55–1.50 kW interval was observed.

Table 5 summarizes the statistical results of the annual empirical distributions for all of the data granularities. For purposes of comparison, the values at the 0.5-s data granularity were used as a reference. Using coarser temporal granularity, ranging from 0.5-s to 30-min, led the sample mean of the consumption load to decrease by 17.61%, 17.68%, 17.65%, and 17,74%, respectively for households #1, #2, #3, and #4. In general, the level of variability for all households was significantly reduced as the maximum load values decreased by 15.55%, 18.28%, 13.74%, and 18.49%, respectively. Nonetheless, larger droops were observed for the relevant minimum values, namely, 20.98%, 19.93%, 17.59%, and 16.02%. The reduction in percentage for the variance of all households was in the 27.54–31.29% interval. This again confirmed the drop in the variability of the consumption load for the households.


**Table 5.** Descriptive statistics of the consumption load profile throughout the year.


**Table 5.** *Cont.*

The sample skewness [91] was positive in households #1, #2, and #3, which underlines that the right tail of the consumption load distribution was longer than the left. On the contrary, household #4 had a negative sample skewness. For coarser data granularities, households #1 and #3 increased their sample skewness value whereas the behavior in households #2 and #4 was exactly the opposite. This was confirmed by the displacement of PMFs in Figure 12.

The sample kurtosis values indicate that all households were leptokurtic [91], which means that the consumption loads were concentrated around the sample mean as household profiles had values greater than 3. Households #1 and #3 had higher sample kurtosis values for coarser granularities, which revealed that the consumption load tended to be closer to the sample mean. This outcome was more pronounced in household #3. However, the behavior in households #2 and #4 was the opposite. Once again, the change in the sample kurtosis values was confirmed by the displacement of PMFs in Figure 12.

#### 3.7.3. Periodogram, Autocorrelation, and Partial Autocorrelation Analyses

This section presents a set of complementary analyses that were performed to explain the influence of data granularity on the description of load profiles. A periodogram analysis, along with an autocorrelation analysis and a partial autocorrelation analysis made it possible to obtain the main periods (or frequencies) of the constituent signals of consumption load fluctuations. The results highlighted the loss of information when using coarse-grained data to describe the load profile features. The analyses were split into two time slices, namely, the 1–100 s interval and the 100 s (1.66 min)–30 min interval. This showed the influence of the aggregation data on these intervals of data granularity. This section concludes by highlighting the daily, weekly, and monthly seasonality of the household consumption load profiles.

Since household consumption load profiles contain more than one source of seasonality, as previously described, our approach at the very beginning removed the trend and seasonal components altogether through differencing and seasonal differencing [68].

To confirm the stationary of the datasets used in periodograms and autocorrelation and partial autocorrelation analyses, Table 6 shows the results the statistical tests described in Section 2.1.1. These tests were applied on the datasets of the annual consumption load profiles for different data granularities. We interpret results using the *p*-value from the test. A *p*-value below a threshold (such as 5%) suggests that we reject the null hypothesis, whereas a *p*-value above the threshold suggests that we fail to reject the null hypothesis.


Statistical tests to check stationary of the consumption load profiles.

> **Table 6.**


**Table 6.** *Cont.*

As can be seen in Table 6, the null hypothesis was rejected in all the tests (*h* = 0 for the tests ADF, variance ratio, LMC, and PP, whereas *h* = 1 for the KPSS test). In addition, the *p*-value analysis for the KPSS and LMC tests yielded reliability levels higher than 99% for most granularities and households. Nonetheless, the reliability levels for the remaining cases were higher than 95%. Therefore, results in Table 6 assured that the differenced time series data were stationary.

Figure 13 depicts the periodogram of the four consumption load profiles for the 0.5-s data granularity and a period of load fluctuations from 1 to 100 s. The different curves in the graphs were generated with 4000 observations (*N*) drawn on logarithmic scales for better visualization [67]. The property of the autocovariance function assured that its accuracy was proportionally improved with 1/*N*. The accuracy of the periodogram was thus 2.5 <sup>×</sup> 10−4. The finest data granularity of 0.5 s, as described in Section 2, limited the frequency analysis to 1 Hz (100). The power spectral density for load fluctuations showed striking behavior differences for each household. The most stable power spectrum was that of household #1, followed by households #2, #4, and #3. In general, the power spectrum for the four load profiles showed two different patterns. For households #1 and #2, the power level for the 15–80 s interval remained stable. This meant that load fluctuations included several constituent signals of equivalent significance. Thus, main constituent signals at periods of 14, 27, 40, 55, and 68 s can be clearly observed. In contrast, although households #3 and #4 showed various peaks in the periodogram, which corresponded to different constituent signals, their relevance strongly increased with the period rise. This was due to the greater relevance of the induced cycling by the climate control system that masked other minor fluctuating cycles. Nonetheless, some main constituent signals can be observed, such as those at periods of 80 s for household #3, and others at periods of 30, 60, and 90 s for household #4.

**Figure 13.** Periodogram of the consumption load profile for households #1 to #4: 1–100 s time slice.

These graphs show that when coarse-grained data were used, from 1 to 100 s, there was a loss of information regarding load profile features. This loss was the greatest for household #1, and in descending order of relevance for households #2, #4, and #3. Furthermore, the main constituent signals that reflect this loss of information for coarse data are evident in these graphs. This analysis confirms the results of the information loss in the daily time-slice analysis (Figure 8), where household #1 also showed the worst behavior, followed by households #2, #4, and #3.

The autocorrelation function, which studies the cross-correlation of a signal with itself, also underlined the constituent signals of load fluctuations. Accordingly, Figure 14 shows the autocorrelation function of the load profiles for four finer levels of data granularity. The curves in the graphs for granularities of 0.5, 2, 5, and 10 s were generated with 200, 50, 20, and 10 observations, respectively. Therefore, the related accuracies were 0.005, 0.02, 0.05, and 0.1, respectively.

**Figure 14.** Autocorrelation function of the consumption load profile in households #1 to #4 for four data granularities.

A high value in the autocorrelation function signifies a repetition pattern, and thus reveals a constituent signal in the load fluctuation. The first two households had an autocorrelation function that was much lower than 30. For example, in household #1, both the peak autocorrelation values at lags of 14, 27, 40, 52, and 68 s and very strong dips at 11, 25, 38, 50, and 66 s were quite remarkable. The former lags were in consonance with the periods found for the constituent signals in Figure 13a (the periodogram). This outcome is also striking for the other households in Figure 14 when compared with the results in Figure 13.

Figure 14 also underlines the loss of information for the load profile features when using coarser temporal granularity, ranging from 0.5 to 10 s. As the granularity increased, the autocorrelation value at the specific lags moved closer to unity. This unit value meant that data were fully autocorrelated and no different information (different constituent signals) was involved. The higher the granularity was, the greater the loss of information. As an example for household #1 and a lag of 14 s, the autocorrelation value increased by 0.9%, 1,72%, and 2.21% for a data granularity of 2, 5, and 10 s, respectively. As can be observed, the shifting of the curves for each granularity was the highest in household #1, followed by households #2, #4, and #3. This was a consequence of a much ampler and more stable power spectral density level for the 2–10 s period in the constituent signals for household #1 (Figure 13).

The partial autocorrelation function can also be used to underline the loss of information when coarse-grained data were used. Thus, Figure 15 shows the partial autocorrelation function of the load profiles for four data granularities. As an example, the constituent signal of a 14 s period in household #2 was analyzed. As the granularity increased, the partial autocorrelation value at this specific lag moved closer to unity. This unit value meant that data were fully autocorrelated and no different information (different constituent signals) was involved. More specifically, the value of 0.5 s moved from 0.0166 to 0.1031, 0.2794, and 0.5042 for 2, 5, and 10 s, respectively.

**Figure 15.** Partial autocorrelation function of the consumption load profile in households #1 to #4 for four data granularities.

Figure 16 broadens the periodogram in Figure 13 covering the period of load fluctuations from 1.66 min to 30 min. The comparison of the power spectral density for the load profiles shows an increasing spread between the base and peak load for profiles #1, #2, #4, and #3. This outcome is evident in Figures 8 and 9, where a drastic reduction of the peaks with higher granularities can be observed. Furthermore, in this interval, the power spectrum level is two orders of magnitude higher compared to that of the 1–100 s interval (Figure 13). Consequently, the contribution to the overall power of the 1–100 s interval was less important and was at least 1% of that of the 1.66–30 min interval.

The power spectrum clearly shows a single constituent signal for household #3, at a period of 25 min, whereas for the remaining households, several main constituent signals are evident. Therefore, in households #1 and #2, there were signals at the 27.45 and 12 min periods, and in household #4, at the 24 and 12.20 min periods. Figure 16 shows that when coarse-grained data from 1.66 to 30 min were used, load profile features were increasingly inaccurate. This lack of accuracy was the greatest for household #1, and in descending order of relevance, for households #2, #4, and #3. Furthermore, the main constituent signals that represent the loss of information for coarse data are shown in Figure 16.

The autocorrelation analysis in Figure 17 pertaining to the 1.66–30 min time slice confirms the constituent signals found in Figure 16 for the different households. In addition, the shifting of the curves (inaccurate profile characterization) for each granularity was in consonance with the power spectral density levels for this 1.66–30 min interval (Figure 16). Thus, the greatest shifting and thus the most inaccurate profile characterization was found in household #1, followed by households #2, #4, and #3.

**Figure 16.** Periodogram of the consumption load profile for households #1 to #4: 1.66–30 min time slice.

**Figure 17.** Autocorrelation function of the consumption load profile in households #1 to #4 for four data granularities.

This section concludes by highlighting the seasonality of household consumption load profiles. However, it is important to note that this seasonality, namely daily, weekly, and monthly, is out of

the temporal resolution scope of data granularity analyzed in this study (up to 30 min). Accordingly, raw datasets without applying differencing processes were used.

Figure 18 shows the periodograms of the four consumption load profiles for the time horizon daily (a) and weekly/monthly (b). Regarding the daily seasonality in Figure 18a, except for household #2, the power spectrum from the interval of 0.65 h to 24 h had almost the same order of magnitude. For the different households, their main constituent signals can be clearly identified as follows: (i) #1 (0.65, 3.49 and 6 h); (ii) #2 (0.65, 3 and 8 h); (iii) #3 (0.71, 1.50, 1.71, 3, and 12 h); and (iv) #4 (0.92, 1.41, 2.67, and 4.9 h).

**Figure 18.** Periodogram of the consumption load profile for households #1 to #4: (**a**) daily time horizon (30 min–1 day time slice); (**b**) weekly/monthly time horizon (1–365 day time slice).

Within the time horizon of one year, Figure 18b, the power spectrum shows that cyclic household power levels differ substantially from one month to the next. In addition, the cyclic power in the interval lower than seven days shows striking behavior differences for each household, decreasing in households #1 and #2 and keeping a more stable value in households #3 and #4. For the different households, their main constituent signals during the weekly and monthly time horizons are as follows: (i) #1 (1, 7, 10, 22, 50, 83, and 167 days); (ii) #2 (1, 7, 10, 21, 41, and 73 days); (iii) #3 (1, 7, 15, 52, 91, and 182.5 days); and (iv) #4 (1, 7, 25, 41, 91, 182.5 days).

#### *3.8. Comparative Study of Granularity Impact in the Literature*

The results obtained in this study were compared with other studies in the literature that address granularity impact. It is important to note that this research not only provides temporal results for different time slices from a sub-hourly to a monthly analysis, but it also offers periodograms along with autocorrelation and partial autocorrelation analyses and empirical distribution-based statistical analysis. The temporal analysis offers information regarding the granularity impact on the change in the magnitude of the peak and trough load. In contrast, the second type of analysis reveals additional information such as the constituent signals in the load fluctuation. Since studies in the literature have focused only on temporal analysis, this comparison was limited to the impact on the magnitude change of the peak load because of the lack of data for autocorrelation analyses and empirical distribution-based statistical analysis.

In this study, the values obtained in the temporal analysis (Figures 7–11) and the empirical distribution-based statistical analysis (Figure 12) illustrated that the load peak decreased when the coarse-grained data was compared with the finest data granularity as follows: (i) sub-hourly analysis (between 27.12–56.44%); (ii) daily analysis (between 20.34–59.09%); and (iii) monthly analysis (between 35.07–50.02%).

Wright [27] compared granularities of 30 and 1 min and revealed a peak load reduction between 16–47% for different households. Murray [38] showed peak load changes in the range of 8.9–16% for granularities of 8 s using a meter. Naspolini [39] registered a drop between 18.56–28.36% when 15 min granularity was compared to 5 min granularity. Bucher [40] studied granularities between 5 s and 1 h, which resulted in a reduction between 2–38% as compared to a 1-s based granularity. Shi [41] carried out analysis with granularities of 1, 5, 10, 15, 30, and 60 min and obtained reductions in the peak load of up to 20% (5 min granularity) and up to 80% (60 min granularity) as compared with the 1 min granularity. Widen [48] found drops between 19.19–26.29% for the 60 min granularity with respect to the 10 min. Hoevenaars [64] disclosed a reduction in the peak load when compared to the 1-s based granularity for different granularities as follows: (i) 10 s (between 1.28–7.45% for different sources); (ii) 1 m (between 1.78–15.04%); (iii) 10 min (between 2.46–22.62%); (iv) 60 min (between 8.01–20.65%).

#### **4. Conclusions**

Increasing interest in the analysis of household electricity consumption profiles, thanks to the rapid deployment of SMs in the residential context, may significantly change the relevance of such profiles in the near future. To understand profile features and their applicability to any action or assessment, it is necessary to appreciate the full range of consumption load fluctuations.

For this purpose, this paper has presented and discussed a methodology that makes two contributions to the state of the art. Firstly, this research proposed periodograms along with autocorrelation and partial autocorrelation analyses and empirical distribution-based statistical analysis, which were used to describe household consumption load profile features. This type of analysis reveals key issues of the granularity impact on the load fluctuation, such as the accurate description of its constituent signals. Secondly, a framework was developed to collect household consumption data at high sampling frequency (>4 Hz). This methodology allowed us to analyze the influence of data granularity on the description of household consumption load profile features. The effectiveness of this methodology was illustrated in a case study of four households in Spain, using thirteen resolutions of data granularity (0.5, 1, 2, 5, 10, 15, 30 s, and 1, 2, 5, 10, 15, and 30 min). We acknowledge that conducting our analysis with this reduced data sample is a limitation of this study. However, it was adequate for achieving our primary objective of demonstrating the usefulness of the proposed methodology in which the ultimate goal was to highlight the information loss regarding the profile features when using coarse-grained data.

Bearing in mind the limits of applicability of our findings, the main outcomes of the study are detailed below. The influence of data granularity on the results for different time slices from sub-hourly to monthly analysis, including daily and weekly analyses, was discussed. Results from sub-hourly analyses highlight the smoothing of peaks and troughs in the consumption load, based on coarse-grained data from 0.5 s to 30 min. More specifically, peaks decreased by 27.12%, 56.44%, 55.73%, and 51.38%, respectively for households #1, #2, #3, and #4. The daily analysis showed higher peak reductions such as 59.09%, 58.14%, 20.34%, and 57.64%, respectively for the previously mentioned households. The repeated peaks were only identified in the daily and weekly analysis at granularities from 0.5 s to 5 s. The monthly analysis provided data pertaining to the day-to-day load behavior by using the ratio between the daily peak or trough load and the daily mean load. This ratio decreased

for coarse data granularity by 50.02%, 39.66%, 35.07%, and 36.89%, respectively for households #1, #2, #3, and #4.

However, the overall influence that data granularity had on the description of household consumption load profile features was performed on an annual basis by using a set of complementary analyses. A statistical analysis based on coarse-grained data underlined the significant change in the empirical distribution shape. The analysis of statistical moments up to the fourth-order reflected the reduction of the level of variability of the consumption load for households when coarse-grained data were used. Periodograms and autocorrelation analyses also indicated the loss of information regarding the profile features caused by the use of coarse-grained data. These analyses were based on the main constituent signals of the load fluctuations. In conclusion, the analyses for different granularities showed that some important loads (e.g., cooling or heating devices, electric water heaters, etc.) produced fluctuations that became increasingly ill-suited for resolutions of 5 s or higher. This confirms that coarse granularities should not be used to collect consumption data because they do not reflect the reality.

The results of our study indicate that it is not necessary to use the finest data granularity, i.e., the 0.5-s resolution. In fact, even for profiles #1 and #2, which showed the greatest fluctuation, a data-resolution of 5 s produced a sufficiently accurate characterization of profile features since the results generated were very close to those of a data-resolution of 0.5 s. Therefore, the use of the 5-s granularity achieves a balance between the computational burden associated with storage data in the cloud and their post-processing, and the loss of information for the consumption profile features.

The results in this research were in line with other studies in the literature that address granularity impact. Since studies in the literature focused only on temporal analysis, this comparison was limited to the impact on the magnitude change of the peak load because of the lack of data for autocorrelation analyses and empirical distribution-based statistical analysis. This review offered a peak load reduction between 1.28–80% with granularities in the interval of 1 s to 1 h.

Future work in the field should take the current limitation of this study into consideration. Further analysis could have been conducted with other households of different characteristics, or the methodology could have been applied to a large set of buildings. It is our hope that this study will spur future work and discussion in the research community regarding the accurate description of household load profile features based on an appropriate data granularity and will ultimately lead to similar work on datasets from other multi-family residential buildings.

**Author Contributions:** All the authors contributed substantially to this paper. F.S.-S., A.C.-O. and C.R.B. performed the simulations and experimental work, and also wrote the paper. J.C.H. provided the conceptual approach, commented on all the stages of the simulation and experimental work, and revised the manuscript. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Agencia Estatal de Investigación (AEI) and the Fondo Europeo de Desarrollo Regional (FEDER) aimed at the Challenges of Society (Grant No. ENE 2017-83860-R "Nuevos servicios de red para microredes renovables inteligentes. Contribución a la generación distribuida residencial").

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Technical Note* **Configurable IoT Open-Source Hardware and Software I-V Curve Tracer for Photovoltaic Generators**

**Isaías González \*, José María Portalo and Antonio José Calderón**

Department of Electrical Engineering, Electronics and Automation, University of Extremadura, Avenida de Elvas, s/n, 06006 Badajoz, Spain; jportalo@alumnos.unex.es (J.M.P.); ajcalde@unex.es (A.J.C.)

**\*** Correspondence: igonzp@unex.es; Tel.: +34-924-289-600

**Abstract:** Photovoltaic (PV) energy is a renewable energy resource which is being widely integrated in intelligent power grids, smart grids, and microgrids. To characterize and monitor the behavior of PV modules, current-voltage (I-V) curves are essential. In this regard, Internet of Things (IoT) technologies provide versatile and powerful tools, constituting a modern trend in the design of sensing and data acquisition systems for I-V curve tracing. This paper presents a novel I-V curve tracer based on IoT open-source hardware and software. Namely, a Raspberry Pi microcomputer composes the hardware level, whilst the applied software comprises mariaDB, Python, and Grafana. All the tasks required for curve tracing are automated: load sweep, data acquisition, data storage, communications, and real-time visualization. Modern and legacy communication protocols are handled for seamless data exchange with a programmable logic controller and a programmable load. The development of the system is expounded, and experimental results are reported to prove the suitability and validity of the proposal. In particular, I-V curve tracing of a monocrystalline PV generator under real operating conditions is successfully conducted.

**Keywords:** IoT; renewable energy sources; photovoltaic energy; I-V curve; monitoring and data acquisition; microgrid; open-source; communication protocols

### **1. Introduction**

Photovoltaic (PV) technology is one of the most widespread renewable energy sources (RES) [1,2] and contributes to reducing greenhouse gas emissions and fighting against climate change [3]. In intelligent energy facilities conceived under the paradigm of smart grids and microgrids, PV generators are commonly the main source of renewable energy [4]. In these facilities, PV can be combined with other equipment for energy production and consumption such as wind turbines, batteries, and hydrogen-related devices (fuel cells and electrolyzers).

In PV-based grids, it is required to monitor the state and operation of the PV devices. In this regard, the efficiency of PV cells under natural conditions is measured using currentvoltage (I-V) characteristic curves [5]. Consequently, to evaluate the performance of PV modules, it is necessary to measure their I-V output characteristics [6].

I-V curves are obtained by performing a voltage sweep on the PV module, while measuring the output current which is delivered to a connected load [5]. Such curves display maximum voltage and current values of a module in a given setting [5]. This way, the analysis of the curves provides direct information on the electrical state of the module, allowing the researcher to obtain data on the expected performance under different conditions of irradiance and load [7]. The measurement system used to acquire data of the PV modules and to visualize the I-V curve is commonly known as the I-V curve tracer.

In the context of RES, monitoring and data acquisition are essential to recognize the resources available on-site, evaluate electrical conversion efficiency, detect failures, and optimize electrical production [8]. In particular, the characterization of PV modules through I-V curves is required for different purposes, and the applicability of such curves

**Citation:** González, I.; Portalo, J.M.; Calderón, A.J. Configurable IoT Open-Source Hardware and Software I-V Curve Tracer for Photovoltaic Generators. *Sensors* **2021**, *21*, 7650. https://doi.org/10.3390/s21227650

Academic Editors: Antonio Cano-Ortega and Juan M. Corchado

Received: 25 September 2021 Accepted: 16 November 2021 Published: 18 November 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

has been widely reported in the literature. For instance, I-V curve tracing is the most commonly applied technique for the electrical characterization of PV modules [9]. An in situ measurement system of PV characteristics can provide valuable information for optimized power generation [10]. I-V curves are widely used to evaluate power generation performance and detect fault conditions of PV generators [11].

Aging effects of PV cells affect the I-V curve [10], and the consequent degradation is usually assessed by means of such I-V characteristics [6]. It must be noted that failures in PV modules may be caused by several reasons such as corrosion failures, cell cracks, hotspots, encapsulation failures, electrical or mechanical connection failures, potential induced degradation, accumulation of dust or soiling, or partial shading, among others [12].

In the context of degradation and failure analyses, detailed parameter fitting can be carried out using the I-V characteristics of the strings or modules for a deeper understanding of degradation mechanisms prior to failure [13]. According to [14], the I-V curve measurement method is time-consuming, but it is very reliable and considered a paramount step in fault detection. For diagnosis, electrical parameters are extracted from the measured I-V curves such as the short circuit current, the open-circuit voltage, the maximum power-point, and the fill factor [6,14]. Indeed, decisions on the replacement of faulty or degraded devices are better taken based on a direct measurement than based on estimation [15].

In addition, another application of I-V curves is related to the modeling of PV modules, which requires estimating or measuring certain parameters. In this regard, a dataacquisition system is essential to collect and store I-V curves so simulated I-V curves can be plotted based on different models [16]. Namely, the well-known single diode model (SDM) requires estimating the values of series and parallel resistances which can be calculated from the I-V curves [17].

However, the manufacturer provides curves measured at laboratory conditions, obtained under standard conditions of temperature and irradiance (standard test conditions, STC), which do not correspond to real operation in physical facilities. These differences affect the I-V characteristic and, thus, the module production, so it is crucial to use an I-V curve tracer [6].

The relevance of I-V tracing can be witnessed in the literature; the instrumentation and monitoring equipment for such a task have received important research efforts. For example, in-depth reviews of curve tracers according to their topologies can be found in [7,11]. Additionally, diverse equipment is available in the market and in the scientific literature. On the one hand, commercial curve tracers can be found in the market, such as in [18–20]. Their main advantage is the reliable measurements that are performed, guaranteed by the manufacturer of the device. On the contrary, the most noticeable drawbacks are that commercial systems are generally expensive and closed for modifications [8]. Furthermore, another important disadvantage is that the control software of commercial tracers is not prepared for an automatic experimental campaign of measurements [21].

On the other hand, custom-designed I-V curve tracers constitute an important trend in recent years. Diverse works have designed curve tracers using general purpose electronic boards such as microcontrollers and digital signal processors (DSP). For example, in [22,23], a portable I-V curve tracer is based on a DSP, being connected to a personal computer (PC) through serial communication. Vega et al. [6] combine a peripheral interface controller (PIC) and an electronic load (MOSFET) to implement an I-V curve tracer. Acquired data are stored on a PC or a smartphone. A low-cost PIC is used by Ortega et al. [12] to develop a prototype of an I-V curve tracer for individual modules in large photovoltaic systems. A curve tracer developed around a commercial low-cost embedded microcontroller (TivaC) is presented in [24]. Additionally, a low-cost microcontroller is the core of the curve tracer presented in [15], where measurements are stored in local memory and downloaded through a universal synchronous asynchronous receiver transmitter (USART) connected to a Bluetooth device.

Moreover, within custom-designed curve tracers, new developments are progressively including Internet of Things (IoT) open-source technology. Namely, open-source hardware platforms such as Arduino and Raspberry Pi (RPi) are being introduced in research projects and facilities. In the scope of PV energy, these devices are applied for data acquisition and monitoring tasks. For example, Arduino is used in [25,26] to sense the temperature of PV modules. With higher computation capabilities, the RPi microprocessor is used to implement monitoring systems for PV-based microgrids in [4,27,28] and for PV plants in [5,29,30].

In particular, to deploy I-V curve tracers, there are still scarce developments involving such IoT open-source technology, so the most recent research works will now be discussed. Within the open-source community, there is information publicly available about I-V curve tracers using Python and the Arduino microcontroller [31]. In the curve tracer proposed in [21], Arduino is responsible for managing a capacitive load, while data storage and visualization are performed by a PC. Arduino is used together with a commercial data logger in [32] to handle a MOSFET load in order to trace I-V curves of PV modules. An Arduino together with a PC is used in [33] to deploy an I-V curve tracer, the PC acting as a storage means for the measured data. The work reported in [34] applies an Arduino board with data storage on an SD card to collect the data of PV modules under shading conditions. Papageorgas et al. [10] develop a low-cost curve tracer involving an open-source platform with an embedded microcontroller called Polytropon and message queueing telemetry transport (MQTT) as a communication protocol.

Regarding the use of RPi, the following works are found. In [35], an IoT-based remote I-V tracing system is developed using an RPi and a cloud-based server aimed at analyzing soiling losses in distributed solar facilities. An RPi is used in [36] to implement a plug and play I-V curve tracer oriented toward the diagnosis of PV modules. A power MOSFET transistor is used as the electronic load during characterization, the data being recorded on the RPi and on an intermediate file transfer protocol (FTP) server. In [37], an RPi is used as the main component in a so-called outdoor test facility (OTF) with IoT capabilities employed to capture I-V and P-V curves of PV modules. Python scripts are used, and experimental validation is reported.

Some relevant requirements and trends of curve tracers have been identified in the previous literature. For example, in [33], it is pointed out that the measure of the entire I-V curve in a short time requires a suitable data acquisition device. Reference [7] identifies various trends in the advancement of curve tracers, among which low-cost measurement systems and low-cost communications are highlighted. In the same sense, the important role that reliable low-cost communications play is emphasized in [15].

The utilization of open-source and IoT technologies for curve tracing and monitoring constitutes another new trend [16,38]. Furthermore, such technologies encourage the previously mentioned trends of low-cost measurements and communication systems. These technologies provide rapid development and cost-effective solutions for smart monitoring systems [16]. Related to costs, as pointed out in [24], the low-cost characteristic of open-source platforms provides greater accessibility to I-V curve tracing equipment for any research or academic center.

An issue of the existing literature is signaled in [38]: there are papers that do not offer information either about the measurement performance or about the equipment used, showing only the results. Moreover, as it is asserted in [7], most of the curve tracers found in the literature are complex and difficult to integrate in real scenarios.

Aiming at overcoming the aforesaid drawbacks and integrating the identified trends, this paper presents the development of a novel IoT open-source hardware and software I-V curve tracer to characterize PV generators. The RPi microcomputer composes the hardware level; concerning software, Python, MariaDB, and Grafana are applied for data acquisition, storage, and visualization. Open communication protocols, such as Modbus TCP, enable seamless data exchange with proprietary equipment. The program coded in Python is responsible for automating the curve tracing of the PV modules through the modification

of the current demanded by an electronic programmable load. The developed system is oriented towards the I-V curve tracing for already existent functioning facilities that require diagnostics, analyses, and/or modeling. In such a situation, this curve tracer is coupled to the facility by means of open protocols and shares data without altering the installation.

For the sake of clarity, a table summarizing the aforementioned literature as well as the present proposal has been elaborated (Table 1). The considered categories are the following: Device, referred to as the equipment used to collect data from PV modules; Load, to indicate the type of load; Data storage means, to discriminate if local (within the Device) or remote accumulation is applied; Language, for a clear identification of the programming language used to gather data; and Communication, in order to specify the protocols for data sharing. In the cases where the information has not been found, Unspecified has been written.

**Table 1.** Table comparative of previous literature dealing with custom-made curve tracers.


In view of the previous table, it can be derived that the presented curve tracer is a novelty due to the fact that it is a unique proposal which combines open-source components (hardware and software) and open communication protocols, all being managed by the RPi. In other words, in most of the literature, an IoT open-source device is used to gather data from sensors and handle the load, data storage and visualization being performed by external equipment or services such as PCs or cloud servers. The present work is the only proposal where the RPi is responsible for automating all the tasks involved in the I-V tracing: load sweep, data acquisition, data storage, communications, and visualization in real-time.

Moreover, as can be observed in Table 1, some works do not report information about certain aspects such as the programming language or data storage means (software or hardware), as it has been previously indicated in [38]. Moreover, none of the surveyed references apply a programmable electronic load to perform the I-V tracing. Resistive, capacitive, or electronic loads (power MOSFET) are among the common methods [7], but they require designing specific electronic circuitry for the curve tracing process and only serve for such a purpose. On the contrary, programmable loads are commonly used in microgrids and PV facilities [4,20,39–41] to emulate the behavior of DC or AC loads in order to test control algorithms and energy management strategies under different load profiles. In this regard, the target groups of this paper are scientists and practitioners in the scope of PV-based microgrids and facilities involved in research and development (R&D) activities.

In addition, the validation of the proposal is performed with a medium-scale PV generator under real conditions, which constitutes a requirement to demonstrate the suitability of open-source technologies [4].

It must be remarked that the presented curve tracer is used to characterize and diagnose a PV generator integrated in a smart microgrid (SMG) which combines renewable sources with hydrogen. Such a facility is framed in an R&D project envisioned to develop a digital replica of the subsystems of the microgrid.

The main contributions of the work are now summarized:


The structure of the rest of the paper is as follows. The Section 2 describes the developed I-V curve tracer concerning hardware, software, and communications. Section 3 deals with the achieved results from a PV generator of 1100 W, whereas the associated discussion is carried out in Section 4. Finally, the main conclusions of the reported work and further research guidelines are addressed.

#### **2. Developed I-V Curve Tracer**

The developed curve tracer is solved by a software application made in Python and executed on an RPi, as well as a database and a data visualization interface. The version of the microcomputer is the RPi 3 model B+. As commented in the previous section, the proposed curve tracer is applied to an existing SMG equipped with an automation system based on a Programmable Logic Controller (PLC) model S7\_1516, which is in charge of the energy management of the SMG. Figure 1 shows the interconnection of all the devices used in the I-V curve tracer. This figure shows the sensors involved (irradiance, voltage, current) together with the PLC, the RPi, and a programmable electronic load. The proposed platform takes advantage of an electronic programmable load model Prodigit 32612A (New Taipei City, Taiwan). This legacy device accepts communication through an RS232 interface in order to exchange commands and is used to configure the current profiles demanded by the photovoltaic panels.

The communications diagram of the deployed system can be seen in Figure 2. The RPi acts as a Grafana server, so the user/operator can visualize and download the data processed by the curve tracer through a web browser running on a computer or smartphone connected to the Internet. Namely, the Grafana software provides a user-friendly graphical user interface (GUI) for real-time access to numerical and graphical information about the measurements of the PV system during the tracing.

Among other elements, the RPi includes serial communication through universal serial bus (USB) ports, so a protocol converter from USB to RS232 has been required to establish communication between the load and the RPi.

Concerning sensors, Table 2 summarizes the magnitudes that are measured and the corresponding sensor. It must be noted that the required sensors could be connected to the RPi in a direct manner or through proper electronic boards. In this sense, the presented solution is applicable to already existing automation and monitoring systems or for new facilities without such systems. Using open protocols, such as Modbus TCP, enables easy communication given the widespread availability of this protocol in automation and energy-related equipment [28]. In addition, Modbus TCP has been pointed out as an industrial IoT communication protocol [42] and is supported by both open-source and proprietary equipment.

**Figure 1.** Interconnection of the I-V curve tracer components together with equipment of the existing microgrid.

**Figure 2.** Communications diagram of the I-V curve tracer.



A block diagram with the functionalities associated with each component is illustrated in Figure 3. From a functional viewpoint, it is interesting to note that all the tasks required for the process of I-V curve tracing rely on the RPi. On the one hand, this microprocessor acquires data from the PLC and stores them in a database (mariaDB). Data backup is also handled by the RPi together with other system tasks. On the other hand, the programmable load is managed through Python commands. Lastly, the Grafana server is hosted to establish Internet-enabled data visualization in real-time.

**Figure 3.** Functionalities implemented by the curve tracer.

Figure 4 shows the flow diagram of the algorithm that implements the I-V tracer. First of all, the irradiance existing at a given time is measured. It is verified whether the measured irradiance exceeds the preset threshold of 100 W/m2 to initiate the current profile generation and data acquisition processes. Once the minimum irradiance condition is met, a profile of the current demanded by the electronic load is created, which progressively increases from zero to the maximum possible current that the PV module(s) can deliver for the sensed irradiance. The maximum value is calculated based on the existing modules configuration and the irradiance at each moment. This ensures that the PV generator will not be required to provide currents that cannot be achieved for this irradiance value. In particular, the following equation has been used to determine the maximum current, Imax:

$$\text{Imax} = \text{np} \times (0.0055 \times \text{G} + 0.1),\tag{1}$$

where np is the number of paired modules, and G is the incident irradiance on the plane of the modules.

**Figure 4.** Flowchart of operations performed by the curve tracer.

The following step consists of establishing communication via RS232 from the RPi to the programmable electronic load, so the current value corresponding to each instant is sent. Next, the RPi takes the sensor data obtained by the PLC through a Modbus TCP channel, as well as from the electronic load itself through the RS232 connection. After this, the retrieved data are stored in a mariaDB database specifically designed for this purpose (Figure 5). This process is carried out continuously for each PV module current until the maximum current set is reached.


**Figure 5.** Database created with mariaDB for characterization of PV modules.

In this way, all the necessary data for the characterization of the photovoltaic panels are acquired and stored. As a sample, Figure 6 shows the data taken for the characterization of the PV modules.


**Figure 6.** Data taken during characterization of PV modules.

To achieve representative data, the irradiance should change as little as possible during the characterization. Each acquisition cycle can vary from 24 s, for an irradiance of 200 W/m2 and a single panel configuration (12 samples), to 320 s, considering an irradiance of 1000 W/m2 for the whole group of panels (320 samples), keeping the step sampling rate at 0.1 A. During these short intervals, irradiance is scarcely altered.

On the other hand, Figure 7 shows a screenshot of part of the Python code running to automate the labeled stages and, hence, the I-V curve tracing.

Aiming to illustrate the described sequence, the main code of Python concerning the load management is shown in Algorithm 1. To begin with, an instance for communication is created, specifying parameters such as port, transmission bit rate, parity bit, etc. After that, the connection is open, and the commands to determine the load current are sent. Namely, the load is activated, the operation mode is selected, and the current demanded to the PV module(s) is established. Moreover, the reached voltage and current values are retrieved. Finally, the connection is closed.

**Figure 7.** Python code for I-V curve tracing.

**Algorithm 1** RS232 Communication through Python


#### **3. Results**

In this section, the experimental results of applying the I-V curve tracer are reported to demonstrate its successful operation. Namely, a PV generator hybridized with hydrogen in a stand-alone SMG, placed at the University of Extremadura (Spain), is fully characterized.

#### *3.1. Experimental Setup*

The PV generator (Figure 8) consists of six monocrystalline modules, each one with maximum output of 185 W, providing a total power of 1110 W. These modules have a fixed inclination angle, the irradiance measured being in the same plane. The main parameters of the PV modules are listed in Table 3. Note that electric characteristics are given for STC by the manufacturer.

**Figure 8.** PV generator for experimentation.

**Table 3.** Main parameters of PV modules.


The curve tracer is coupled to the PLC and the load of the SMG in the laboratory as can be observed in Figure 9a. Note that an Ethernet switch allows data exchange between the RPi and the PLC. The programmable load can be seen in Figure 9b, also placed in the laboratory setup. On the other hand, the block diagram of the SMG is depicted in Figure 10. As can be observed, the PV array is linked to a DC voltage bus through a solar charger. A battery acts as electrochemical energy storage whilst the programmable load conducts the energy consumer role. Regarding hydrogen generation and consumption, an electrolyzer (EL) produces hydrogen, harnessing the surplus of PV energy, and a fuel cell (FC) performs the opposite process, converting hydrogen into electricity when there is no renewable energy availability. A more detailed description of the SMG components can be found in [4,28].

#### *3.2. Data Visualization*

The measurement process was carried out during different days due to the variability of weather conditions (cloudy and rainy days, etc.). More than 194,000 samples were recorded during the whole measurement campaign. The stored data are represented through the GUI created in Grafana, which displays the involved magnitudes in the form of time-series.

As a proof of the visualization capabilities, Figure 11 depicts the aspect of the GUI showing the measurements during a day of the PV generator characterization, in particular, 16 March 2021 from 8:00 to 19:00. The typical curve of solar irradiance along the day can be appreciated in the lower graph, reaching a maximum value of 1031.48 W/m<sup>2</sup> at 13:17. During this day, the procedure begins at 8:18 once the irradiance threshold of 100 W/m2 is exceeded and lasts until 18:47. The top chart represents the current delivered by the PV generator, which fluctuates according to the management performed by the Python program of the curve tracer.

**Figure 9.** Setup in laboratory: (**a**) detailed view of curve tracer; (**b**) entire view including the programmable load.

**Figure 10.** Block diagram of SMG where the PV modules are installed.

**Figure 11.** Grafana GUI displaying time-series of PV current and irradiance during a day of characterization.

Figure 12 contains a detailed view of the GUI during the same day for a better observation of the magnitude evolution. In the top graph, it can be seen that the current delivered by the PV generator (blue color) and the load current (red color) are coincident and both exhibit a saw tooth-shaped evolution, coherent with the implemented algorithm. The sensed irradiance during the viewed interval is 1027 W/m2.

**Figure 12.** Detailed view of Grafana GUI to observe PV current and irradiance during characterization.

In order to verify the capabilities of the curve tracer, the computational resources of the RPi are also monitored by means of Grafana. To this aim, the GUI includes a dashboard based on Telegraf [43] to visualize the central processing unit (CPU) temperature and load, memory usage, and network statistics. Figure 13 shows this dashboard during the characterization experiments on the same day shown in the previous figures. There are some relevant aspects to discuss in this sense. The usage of CPU is observable in the graph placed in the top left position, and its nominal value is around 4%. There are certain intervals during which the usage rises up to 17% for the system (yellow color) and up to 54% for the user (green color), respectively. These increments are due to Grafana operations, e.g., access for online monitoring and requests to the database. Another parameter is the memory usage (graph in the low and left part) where less than 1 GB is used (yellow line) and around 1 GB is cached (blue line), leaving 2 GB free (orange line), showing a stable behavior. Concerning the CPU temperature, it has a stable value around 35 ◦C, being an appropriate level to avoid overheating issues.

**Figure 13.** Dashboard devoted to monitoring the resources of RPi during experiments.

#### *3.3. I-V Curves of PV Generator*

To achieve a proper validation, I-V curves have been obtained under real operating conditions for the PV generator. In addition, three configurations of the modules have been applied: a single module, a pair of modules connected in series, and the whole generator, consisting of the parallel connection of three pairs.

For the curve tracing, it has been required to select the data for the different irradiances close to the values commonly provided by manufacturers and reported in the literature, namely 200 W/m2, 400 W/m2, 600 W/m2, 800 W/m2, and 1000 W/m2. Due to the short duration of the data acquisition intervals, the initial and final values of irradiance are averaged. Table 4 shows the measurements of the incident irradiance and the temperature of the modules during the characterization campaign for each one of the described electrical configurations. Moreover, electrical parameters of the generator can be measured in the I-V curves, such as short circuit current, open-circuit voltage, fill factor, etc.; hence, in Table 4, such parameters are also included.

Figure 14 shows the I-V curves obtained for a single module. The shape and trend of the curves correspond to those expected, matching the information provided by the manufacturer. As can be observed, the open circuit voltage (Voc) decreases whilst the irradiation increases. This effect is due to the associated temperature increase, which causes the curves to move to the left. In particular, the open circuit voltage strongly depends on temperature, while its dependence on irradiance has a modest effect [17]. This relationship can be expressed through Equation (2) [17]:

$$\text{Voc(T)} = \text{Voc}\_{\text{'STC}} + \mu\_{\text{Voc}} \text{ (T} - \text{T}\_{\text{STC}}\text{)},\tag{2}$$

where Voc,STC is the open circuit voltage for STC, TSTC corresponds to the STC temperature, and *μ*Voc is the voltage temperature coefficient, found in the PV module datasheet. For the LDK Solar 185D-24S, such a coefficient has a value of −0.34%/◦C, so it is easy to check that temperature increments give place to decrements of Voc.

**Table 4.** Irradiance, temperature, and electrical parameters measured during characterization for different configurations of the PV modules.


**Figure 14.** I-V curves for a single PV module.

In a similar sense, the power-voltage (P-V) curve can also be plotted from the acquired data; for instance, Figure 15 shows such a curve for the single PV module. Valuable information such as the maximum power point values (power, current, and voltage) for sensed irradiances can be studied through these curves.

The maximum power produced (165 W) by the module is lower than that reported by the manufacturer (185 W) given the fact that the existing conditions differ from the STC. Moreover, the degradation of the module also contributes to reducing the peak power that can be delivered.

Following the validation procedure reported in [23,33,34,36,37], the experimental measurements are reproduced by means of a simulator of PV modules based on the SDM. This model is based on the equivalent circuit and is the most widely used method to provide an estimation of the current generated by a PV cell. The circuit consists of a single diode connected in parallel with a photo-generated current source (*IPH*), a series resistance (*RS*) to represent voltage drops and internal losses, and a shunt resistance (*RSH*) to take into account the leakage currents. Equation (3) describes the model for a module of *NS* cells in series:

$$I = I\_{PH} - I\_o \left[ \exp\left(\frac{V + IR\_S}{nN\_S V\_{TH}}\right) - 1 \right] - \frac{V + IR\_S}{R\_{SH}} \tag{3}$$

where *Io* is the saturation current of the diode, *V* is the output voltage, and *VTH* is the thermal equivalent voltage. The last variable is given in terms of the electron charge, *q*; the Boltzmann constant, *K*; the cell temperature, *T*; and the diode ideality factor, *n*, according to Equation (4):

$$V\_{TH} = KT/q\_{\prime} \tag{4}$$

The I-V curve experimentally measured with the curve tracer at an irradiance of 1019 W/m2 and temperature of 45.1 ◦C is plotted in Figure 16 (black color) together with the curve provided by the SDM simulator (orange color). As can be observed, the curves show the same trend with very scarce differences. Namely, the ideality factor of the SDM explains the difference appreciated in the knee of the curve [36].

For a better appreciation, the difference between the currents (simulated and measured) can be used to illustrate the achieved fitting [44,45]. In this regard, Figure 17 shows the difference of currents for the characterized module versus the voltage at the reported irradiance levels. The errors are small, reaching a maximum value of 0.21 A for 870 W/m2. In Figures 14 and 17, it can be seen that the maximum values of these differences are located in a reduced range between the maximum power point (voltage higher than 29 V) and the Voc. These results exhibit proper agreement with the well-known SDM.

The traced I-V curves for a pair of modules connected in series are depicted in Figure 18. The shape observed in the traced curves allows diagnosing or detecting diverse effects in the PV modules, as pointed out in previous works [7,36]. In this case, the curves show a certain alteration in the inflection point and slopes, which indicate the degradation of one of the modules. Therefore, these curves serve for fault detection and diagnostics; namely, aging effects, cell cracking, hot spots, potential induced degradation, and other deterioration situations can be detected. In fact, the modules have been working for 10 years, so aging effects can be expected. Nonetheless, in-depth diagnosis and fault analyses are out of the scope of this paper. On the other hand, Figure 19 contains the traced P-V curves for the pair of modules.

**Figure 16.** Experimental and simulated I-V curves for a single PV module at 871 W/m2.

**Figure 17.** Difference of measured and simulated currents for the single PV module.

**Figure 18.** I-V curves for two PV modules connected in series.

**Figure 19.** P-V curves of a pair of PV modules.

Finally, the I-V curves captured for the whole PV generator are shown in Figure 20. The corresponding P-V curves are depicted in Figure 21. As in the previous figures, the curves display the expected operation of the generator and can be applied for diagnostics purposes.

**Figure 20.** I-V curves of PV generator.

**Figure 21.** P-V curves of PV generator.

#### **4. Discussion**

Experimental results provide I-V curves for different electrical configurations and environmental conditions, emphasizing the suitability of the designed curve tracer.

The main strength is that the developed system is not limited to data acquisition of PV modules for I-V curves, but data recording and visualization in real-time during the characterization are also entirely approached. Indeed, once the desired conditions are programmed, the fully autonomous operation of the curve tracer is achieved without requiring the intervention of the operator.

The deployed curve tracer consists of the RPi and the associated software, whilst a PLC and a programmable load of an experimental SMG are used to validate the operation of such curve tracer.

The computational capabilities of the microprocessor are proven to be adequate to resolve for data acquisition, storage, and visualization. It must be emphasized that none of the previous literature provides information in this regard.

Using an in-house database (mariaDB) and a web-enabled user interface (Grafana) avoids dependencies on external servers and the associated hosting or licensing costs. Hosting on one's own databases even implies a total control of administration aspects [4].

As a proof of concept, in the reported application case, Modbus TCP and RS232 have been used. However, the curve tracer can manage virtually any communication protocol given the wide availability of libraries on the Internet. Furthermore, this ability to support many other protocols provides features such as configurability and modularity, facilitating interoperability [46].

In particular, the use of open communication protocols such as Modbus TCP together with the ability of the open-source equipment allows for the establishment of seamless data exchange. This way, proprietary equipment (PLC) is combined with the curve tracer without interoperability issues. In fact, logical connections through communication protocols enable measurement information sharing and facilitate integration in real scenarios, which constitutes a disadvantage in most of the curve tracers in the literature [5]. In this regard, the deployed system is focused on PV facilities already existent, so the coupling is made through the aforementioned open protocol. The curve tracer even makes use of already existing sensors, which is a benefit since the PV generator can be re-characterized when required without essential alterations in the electrical and communications schemes.

Instead of using a variable resistor, capacitive load, or a power MOSFET, the proposal employs a real electronic programmable load to perform the I-V tracing. In addition, the used load is legacy equipment which does not support modern communication interfaces, so being able to manage such valuable equipment is an important advantage. In fact, IoT technologies must contribute to solve compatibility and interoperability issues with legacy devices [47,48].

Regarding economic assessments, the cost of the curve tracer is very low given the inexpensive nature of the IoT open-source equipment, which constitutes an advantage of scientific equipment based on this type of technology [49]. Namely, taking into account that all the software is free (Python, mariaDB, Grafana), only the RPi involves expenses; the overall cost is around EUR 70. Auxiliary elements such as a memory card, heatsink and fan for cooling, and power adapter are also included.

Analyses of the retrieved I-V curves allow decision making with respect to operation and maintenance of the PV modules as well as implementing accurate models. Moreover, further experiments will include partial shading of the PV modules in order to obtain and analyze the measured I-V curves.

Thanks to the flexibility and availability of open-source equipment, the system can be customized to fulfill particular requirements in research or academic contexts. The RPi provides a large number of analogue and digital inputs, allowing the connection of additional sensors or instruments. Indeed, advances in IoT technology, both hardware and software, can be integrated in the presented system.

Despite the obtained results, the presented system has some limitations which are now briefly described. To begin with, managing open-source technology does not imply ease of configuration when advanced functions are required. For example, programming skills and a certain expertise in communication protocols and networks are needed. In addition, the proposal does not allow online measurements of the PV modules; it is only devoted to offline characterization. For a proper data exchange, it is necessary that the automation unit (PLC or similar device) and the programmable load provide communication interfaces that the RPi can handle. It is not a probable boundary in modern devices, but for legacy equipment, it must be carefully tackled. Finally, the representation of the I-V curves requires manual data extraction from the files that Grafana stores and provides. This can be a time-consuming task when a large number of measurements have been conducted.

#### **5. Conclusions**

RES are key enablers for the evolution towards a more sustainable energetic global scenario, PV technology being one of the most applied RES in microgrids. In order to characterize and study the behavior of PV modules, an I-V curve tracer based on IoT opensource technologies has been presented. Namely, software such as Python, MariaDB, and Grafana run on an RPi are responsible for automating all the required tasks: load sweep, data acquisition, data storage, communications, and visualization in real-time. An open communication protocol (Modbus TCP) has been applied to exchange information with a PLC, whilst an RS232 allows for managing a legacy programmable load. Both proprietary devices belong to a research-oriented microgrid facility and serve as proof of concept to prove the suitability of the curve tracer.

It must be emphasized that this development is a novelty in the existing literature, addresses trends, and overcomes limitations identified in previous works, among which short-time measurements, low-cost measurement systems, low-cost communications, and IoT open-source technology can be highlighted.

Experimental results under real operating conditions are used to validate the proposal. Namely, a PV generator of 1110 W integrated into an SMG is characterized by means of the developed curve tracer.

Future research includes diagnostics and fault detection of the PV modules. Furthermore, another interesting topic deals with the development of an on-line characterization procedure using the presented system.

**Author Contributions:** Conceptualization, J.M.P., I.G. and A.J.C.; Methodology, I.G. and A.J.C.; Validation, J.M.P. and A.J.C.; Investigation, I.G. and A.J.C.; Data Curation, J.M.P. and A.J.C.; Writing—Original Draft Preparation, J.M.P. and I.G.; Writing—Review and Editing, I.G. and A.J.C.; Supervision, A.J.C. All authors have read and agreed to the published version of the manuscript.

**Funding:** This project was co-financed by European Regional Development Funds FEDER and by the Junta de Extremadura (IB18041).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**

The following abbreviations are used in this manuscript:



#### **References**


### *Communication* **A Customized Energy Management System for Distributed PV, Energy Storage Units, and Charging Stations on Kinmen Island of Taiwan**

**Hsi-Chieh Lee 1,\*, Hua-Yueh Liu 2, Tsung-Chieh Lin <sup>1</sup> and Chih-Ying Lee <sup>3</sup>**


**Abstract:** Kinmen, the famous Cold War island also known as Quemoy, is a typical island with isolated power grids. It considers the promotion of renewable energy and electric charging vehicles to be two essential strategies to achieve the goal of a low-carbon island and smart grid. With this motivation in mind, the main objective of this study is to design and deploy an energy management system for hundreds of current PV sites distributed on the island, energy storage systems, and charging stations on the island. In addition, the real-time acquisition of the data for power generation, power storage, and power consumption systems will be used for future demand and response analysis. Moreover, the accumulated dataset will also be utilized for the forecast or prediction of renewable energy generated by the PV systems or power consumed by the battery units or charging stations. The results of this study are promising since a practical, robust, and workable system and database are developed and implemented with a variety of Internet of Things (IoT), data transmission technologies, and the hybrid of on-premises and cloud servers. Users of the proposed system can remotely access the visualized data through the user-friendly web-based and Line bot interfaces seamlessly.

**Keywords:** distributed PV; energy management system; energy storage units; charging piles; smart grid; redundancy; IoT; Home Assistant; low-carbon island; Kinmen

#### **1. Introduction**

#### *1.1. Climate Responsibility and Energy Generation of Kinmen, Taiwan*

From Taiwan's perspective, three forces firmly push the renewable energy strategy forward. First, as a member of a global society, Taiwan provided its Intended Nationally Determined Contribution (INDC) on the 17 September 2015, including targets to achieve a 50% reduction below the BUA GHG emission level by 2030 [1]. Furthermore, Taiwan has demonstrated its commitment to achieving net zero by 2050 [2] through concrete actions, including implementing the Climate Change Response Act [3] in response to the 2021 26th Session of the Conference of the Parties (COP26), the U.N. climate conference held in Glasgow. Second, from an energy source viewpoint, Taiwan's dependency on imported energy was 97.5% in 2020 [4] and even higher over the past 10 years. Looking into the composition of net power generated and purchased energy in 2021 [5], thermal energy dominated at 81.6%, as shown in Figure 1. It is apparently a risk regarding energy dependency and diversity. Third, Taiwan is still ambitious to strive for the vision of a nuclear-free homeland in 2025 with a clear energy target: 50% by natural gas, 30% by coal, and 20% by renewable energy. For the aforementioned goals, it is clear that promoting lowcarbon renewable energy plays an essential role in achieving INDC and the nuclear-free vision and further balancing energy generation dependency and diversity.

**Citation:** Lee, H.-C.; Liu, H.-Y.; Lin, T.-C.; Lee, C.-Y. A Customized Energy Management System for Distributed PV, Energy Storage Units, and Charging Stations on Kinmen Island of Taiwan. *Sensors* **2023**, *23*, 5286. https://doi.org/10.3390/s23115286

Academic Editors: Antonio Cano-Ortega and Francisco Sánchez-Sutil

Received: 13 March 2023 Revised: 29 May 2023 Accepted: 30 May 2023 Published: 2 June 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

**Figure 1.** The net power generated and purchased by Taiwan's Taipower Company, a state-owned exclusive enterprise.

Kinmen is an outlying island of Taiwan with a 150 km2 area, which has an isolated power grid for its electricity supply due to a distance of 248 km from western Taiwan. In Kinmen's history, it had 43 years of being front-line against Communists until the abolishment of the military administration in 1992 [6]. With the gradually improved relationship between Taiwan and China and direct transportation across the national border, more and more tourists came to the Kinmen islands, resulting in higher and higher energy demand. Since 2013, Kinmen has been selected as a demonstrated low-carbon island by Taiwan Executive Yuan, and visions and strategies were set to reach zero carbon by 2030 [7]. The installation of renewable energy stations and low-carbon transportation have become two of the main strategies to achieve these goals.

In 43 years of military administration, this front-line constructed many distributed military facilities across the whole Kinmen islands, which were gradually released or abandoned since the troops left. From the analysis of [8–11], a distributed renewable energy power grid integrated with those facilities is a suggested and suitable strategy to fulfill Kinmen's low-carbon vision.

Compared with traditional power networks, the smart grid is an advanced electricity platform that emphasizes two-way communication based on digital information technology. Key elements of an advanced smart grid include bulk electricity generation, demand response, distribution, utility companies, customers, transmission, service providers, and renewable energies [12]. Among these, sustainable analysis and management of data and information generated along with all activities is one of the most critical and valuable measures to get system efficiency and constant improvement.

In September 2021, the total installed PV capacity in Kinmen reached about 10.7 MW. While a distributed PV and energy storage system has become an essential approach for the Kinmen local government to move the low-carbon island vision forward, a reliable monitoring and data acquisition system that can constantly work for future data analysis of energy generation and efficiency under different circumstances is needed for this goal of a low-carbon island.

#### *1.2. Remote Real-Time Monitoring and Controlling System for Distributed PV and Energy Storage Stations*

Rahman et al. [13] conducted a very detailed review of different monitoring systems for PV since 1994, including RTAI, ZigBee, DAQ, SCXI, PIC, PLC, etc., in terms of their fundamental features, architecture, performance, and budget. Some of these remote systems further embrace the Internet of Things, web applications, and cloud platforms. The value of gathered data includes the sustainable status of the PV system, failure or error detection, and warning notification.

Al-Fuqaha et al. [14] reviewed the overview of the most relevant architectures and protocol standards for IoT. This study summarized the five-layer IoT model as the most functional architecture to develop an IoT system: Objects, Object abstraction, Service Management, the Application Layer, and the Business Layer. The Application layer relies on high computational machine resources. IoT functionality includes Identification, Sensing, Communication, Computation, Service, and Semantics. Challenges of Availability, Reliability, Mobility, Performance, Management, Scalability, Interoperability, and Security Privacy should be carefully considered in developing an IoT system.

In [15], a comparison of IoT sensor modules among Arduino, Raspberry Pi, PLC, and BeagleBone shows perspectives of data handling, cost and module size, and coding language. Ansari et al. concluded that Raspberry Pi is the most recommended due to its extension capability.

Plenty of research has explored this field regarding local IoT networks and remote system connections. The popular wireless communication technologies used by IoT are shown in Figure 2.

**Figure 2.** Comparison of different wireless technologies in terms of data rate and transmission distance.

Belghith et al. [16] designed a remote monitoring system that features star architecture of sensors, GSM communication, and a human–machine interface. Zego et al. [17] developed a wireless network to send sensed data to a local Raspberry Pi server via Zigbee. Li et al. [18] proposed a local ZigBee network and GSM connection for PV monitoring and fault diagnosis. It consists of data acquisition, data gateways, and a monitoring website based on the PHP Laravel framework. In Low-Power Wide-Area Network (LPWAN) applications, LTE-M, Sigfox, LoRa, and NB-IoT were developed. Among them, LoRa and NB-IoT are the most promising. LoRa is used in [19] for long-range and low power consumption requirements. In another implemented study [20], an Arduino-based data logger was designed to integrate 3G communication to serve stand-alone PV sites. Ascensión et al. described detailed designed data logger specifications corresponding to the IEC61724 standard. In [21], ZigBee was used as a local sensor network. After that, a 4G gateway was used to connect the local network to the internet for remote real-time monitoring. Melo et al. [22] proposed LoRa and Wi-Fi as local wireless networks. The structure comprises three key parts: data loggers, a local IoT system, and a Web application for monitoring.

Key research and comparisons are summarized in Table 1.


**Table 1.** Comparisons with cited research.

Redundancy refers to the backup of the system to prevent service disruption due to single-point failure. Namely, redundancy is the measure to achieve a robust and reliable service system. In order to ensure system redundancy, extra replicated servers are created with the same functions, applications, and other important service components. Failover means seamlessly and automatically switching to prepared backup servers while the primary system is down. The purpose of failover is to reduce the impact when a system failure happens. To the best of the authors' knowledge, no previous studies have ever explored this mechanism in the renewable energy field or established reliable systems with this approach.

Moniruzzaman et al. [23] proposed a reliable web system supporting continuous service even if a system component fails. This high-availability system features computer cluster and loading balancing deployment via a three-tier architecture consisting of a Linux virtual server, virtualization, and shared storage.

Nguyen et al. [24] analyzed a hospital MIS system and suggested integrating different load balance and failover strategies to sustain hospital services under heavy system workloads. This edge/fog-based system design evaluated three load balance techniques: probability, random, and shortest queue-based approaches with or without failover function at different layers.

The main objective of this study is to develop and deploy a robust, reliable, workable, and suitable IoT-based PV monitoring system specific to Kinmen as a significant approach to achieving zero-carbon and smart-grid visions. This monitoring system is implemented in Kinmen with coverage of more than 40 sites, which is about half of the whole PV installed capacity in Kinmen. It is capable of collecting and archiving real-time data into on-premises and cloud database servers with IoT subsystem support that leverages Home Assistant, an

open source of IoT Hub, to monitor the status and electricity usage of appliances, power generation of PV panels, and charging stations for electric motorcycles.

The main contributions and novelty of this study are as follows.


#### **2. System and Methods**

This study designed and deployed a set of information systems for data acquisition and monitoring, which was applied to many distributed energy storage and renewable energy sites on a medium-sized island with an independent power grid as a basis for system security, performance, maintenance, and data technology development.

#### *2.1. System Overview and General Description*

The overview of the proposed system, which aims to contribute a smart grid in Kinmen, is composed of five layers of critical functions, as illustrated in Figure 3. The first layer is distributed facility sites, including PV, battery, and charging stations, and it is the core of the green energy facilities of the whole project. The second layer is IoT, the front tier of the proposed monitoring information system, which is deployed to sense real-time data of the daily running facilities. The third layer is data acquisition, which is designed to get all real-time data back to on-premises servers. The fourth layer is the hybrid of cloud and on-premises deployment, which is capable of handling ample data information flow and designed with the perspective of redundancy and failover. The last layer is a custom SCADA system designed by this project with various friendly user interfaces.

The most valuable element of this proposed system is the data. These facilities operate daily and generate real-time big data, which could be further analyzed and transformed into periodical reports, or as critical datasets for predictions in future uses. The fourlayer architecture of data processing is explained in Figure 4, namely, data sensing, data transmission, data storage and process, and data display and access. For the long-term study of the following plan, demand response analysis, this plan is deployed mainly to collect three types of data, power generation, power consumption, and power storage, depicted in Figure 5.

In addition to the overview mentioned above, Figure 6 presents low-level intra-system interactions, dataflow, network, interface, and user GUI, which is depicted based on onpremises deployment and will be explained in the following sections.


**Figure 3.** The high-level and five-layer architecture of the proposed system.

**Figure 4.** The four-layer architecture of the data processing flow of the proposed system.

**Figure 5.** Purpose of dataset collection.

**Figure 6.** Low-level block diagram of on-premises deployment of the proposed system.

Redundancy and failover design are basic requirements for a sustainable and robust system. AWS cloud service is leveraged in the redundancy plan in this study. Figure 7 depicts the cooperation and backup among servers belonging to on-premises or cloud.

**Figure 7.** Cluster redundancy design based on a hybrid of cloud and on-premises deployment.

Figure 8 shows key software and hardware technologies that serve the system per the site's conditions and connection flexibility. Open-source software is leveraged as much as possible for better coding extensionality while hardware and facility are developed and deployed.

**Figure 8.** Key technologies of the software and hardware used.

#### *2.2. Detailed Design*

As described in Figures 3 and 4, the system operates within five-layer system architecture in which each subsystem interacts and four-layer data processing architecture in which data are generated, transmitted, stored, processed, and displayed. This section explains low-level activities and critical designs.

2.2.1. On-Premises Remote Central Monitoring and Archiving Database System

• Web server

The SCADA, a custom web application as the monitoring and controlling core, can be remotely accessed from anywhere and at any time. It is designed with Python-based Django architecture and mainly leveraged with Google Maps and Google Chart APIs for site localization and statistics visualization. Regarding remote controlling, Python-based APIs were developed for front-end requests through HTTP. This server lives in Windows OS with an Apache web server in the production phase.

• MySQL master server

The MySQL server supports the back-end data archive and retrieval. The master is installed with the web server in one host for better transmission speed. The database application GUI example is shown in Figure 9. The earliest data were established in 2015.

• Line bot GUI

The Apache server uses an SSL certificate for the HTTPS channel. Line bot web-hook lives in Django with HTTPS support. Users can actively query from a smart device or passively receive daily reports via this automatic bot publication functionality.


**Figure 9.** MySQL Workbench.

• Data Collector

In Figure 6, group 1 PV is mainly state-owned facilities. A C# API was designed as a data collector for retrieving the data servers of this group. Meanwhile, a C# TCP/IP application was designed for direct connection to group 3's PV inverters, which were designed without middle data servers. As for group 2's PV and charging station, the crawler is used to collect from a third-party's API from middle servers. These groups were built for different purposes at different times, so different data acquisition approaches are used to retrieve and observe their real-time data. Nevertheless, all data are finally archived in the same database with the same data format.

2.2.2. IoT Hub, Local Database, and IoT Network

• IoT Hub

Home Assistant (HA), a popular hub tool for most IoT devices, was introduced as the IoT Hub, a Python-based open-source platform specific to smart-home applications. Dataflow between HA and IoT devices could be direct and local via LAN or indirect via external third-party API. The former is preferred because of privacy considerations. For the GUI of HA, users can access it via a web browser or smart device APP. In a LAN case, it may need VLAN to get a HA link to a different subnet, while VPN is required in order to be through the internet. This GUI is mainly for developers or system administrator access, not for regular users.

• Local Database

SQLite is used locally to work with HA. It also works as a data logger for IoT devices and local backup for the central MySQL database in case the internet is out of the connection.

• IoT Network

The wireless IoT controllers and sensors are connected to LAN via Wi-Fi, BLE, IR, or sub-1G. In case the facility site condition is complex for mentioned wireless or wired internet access, 4G LTE is used for internet connection, such as for stations in rural areas.

2.2.3. PV, Battery, and Charging Stations

• PV station

The ongoing project continues to increase data collection of newly built PV stations in Kinmen. So far, the relevant data includes information from state-owned stations, privately owned resident stations, Taipower project stations, and Lab stations, among others. Some sites are based on FIT contracts, and some are for private use or research. The earliest sites have been running since 2015. The total installed and monitored capacity in this system is about 5 MW. More than 5 years of data from state-owned sites are incorporated. Figure 10a shows all monitored sites in the system via Google Maps, and Figure 10b shows one case with clear PV panels on the roof in satellite picture mode.

(**b**)

**Figure 10.** (**a**) All monitored PV sites in Kinmen are shown on Google Maps; (**b**) a clear PV-panel image on the roof of the monitored site.

• Battery station

Distributed battery stations were added to this project in 2021, mainly for demand response research. Until now, one site stably runs for over a year with a 10 kWh storage capacity. The key components are a Windows PC, inverters, meters, and batteries inside this facility.

• Charging stations

Kinmen has 65 state-owned free charging sites for electric motorcycles. It started monitoring the charging data from some newly built charging piles for vehicles and motorcycles in 2021. Figure 11 shows one newly built site in Kinmen National Park.

**Figure 11.** (**a**) Field picture with charging piles, (**b**) power distribution box with an IoT clamp-meter marked in red.

2.2.4. Redundancy and Failover

• Cloud redundancy

In Figure 6, a cloud AWS VM is used for the replicated web server and load balancers. As to the Mysql database, SaaS database service is used as well. A load balancer contributes workload balancing and automatic web service failover functions.

• On-premises redundancy

The load balancer is used for database failover with a shared NAS drive. All Mysql servers get real-time synchronization by setting one master and two slaves.

2.2.5. Software and Language

• Python 3.9

Python and open-source Python-based applications were mainly used for better integration, extensions, and sustainability, such as Django, Flask, HA, HTTP API, Line bot, data collector, and battery charging scheduling application.

• C# 10

For the site group 1,2, a C# application was developed to work as the data collector. This application GUI is shown in Figure 12.

• Vendors' APP for IoT device

This is a backup alternative to web GUI and HA GUI of deployed IoT devices. However, the disadvantage is privacy concerns due to data uploaded to third-party servers.

• Labview for battery module

The battery control console was designed by Labview.

• HA

A VM of Linux-based HA OS is used inside the Windows server. GUI for HA is used via web service. It is easy to access from anywhere with the internet.

**Figure 12.** C# GUI for PV stations.

2.2.6. Hardware

• PC

Regular PCs are used with Windows OS for servers.

• IoT devices and network devices

Devices including smoke sensors, temperature and humidity sensors, motion sensors, clamp-on meters for electricity measurement, switches, curtain controllers, air conditioning controllers, and smart lighting bulbs were installed, as well as network devices including routers, Wi-Fi AP, Wi-Fi/BLE gateways, and infrared (IR) remote controllers.

• Facility Stations

The facility station mainly includes PV panels, inverters, batteries, and charging piles.

#### **3. Results**

#### *3.1. PV Stations*

For PV site real-time monitoring and historical data review, users can accomplish this via web GUI on a desktop or line bot on smart devices, as shown in Figure 13. Users can actively or passively receive detailed daily data from the Line bot application (Figure 13b).

#### *3.2. IoT Devices*

For IoT hub monitoring and controlling, users can accomplish this via HA GUI or web GUI as in Figures 14 and 15. Due to the higher risk of battery operation, a temperature/humidity sensor and a smoke sensor were put inside the battery cabinet to monitor environmental security.

#### *3.3. Battery Station*

Battery monitoring and schedule control can be accomplished via HA GUI or web GUI as in Figure 16.

#### *3.4. Redundancy and Failover Based on a Hybrid of On-Premises and Cloud Servers*

The on-premises central hosts are located in a lab of Quemoy University. Typically, several power failures or internet disconnections happen each year. In the initial stage of the project, these events would lead to the web service going down or discontinuity

of collected real-time data in database servers. Since the introduction of redundancy and failover mechanisms, the supporting servers are globally deployed on AWS with much less chance of being down in the meantime.

**Figure 13.** (**a**) Desktop web GUI, (**b**) smartphone line bot GUI. Both show PV daily data.

**Figure 14.** HA browser GUI for IoT device controlling and monitoring.

**Figure 15.** (**a**) HA temperature daily charting of IoT sensor, (**b**) Web browser GUI for IoT device control.

**Figure 16.** Web GUI for battery.

#### **4. Discussion**

SCADA mainly uses the Django web application and Home Assistant to monitor and control the facility and IoT devices. They are browser-based, so users can easily access them anywhere on any computer or smart device. It also provides Line Bot, which has "reply message" and "push message" functions as monitoring alternatives. The front-end service servers are deployed both on-premises and in the cloud as a redundancy design, and back-end database servers are deployed similarly. From the perspective of service accessibility, reliability, flexibility, and availability, this proposed system is much more comprehensive and functional than the cited research.

Facility stations in this study spread across the whole main island of Kinmen. Some of the stations downtown can be connected to the internet via wired or wireless methods, but some rural areas must use 4G LTE for wireless internet access. 4G LTE has higher quality and transmission rate than the other technologies shown in Figure 2 in case more extensive data transmission is needed, such as video surveillance.

Via scheduling setup, the energy storage system is beneficial to balance PV power generating fluctuation due to sunlight intensity and time-of-use rate mechanism. Electrical transportation is a sure trend for low-carbon policy. A good understanding of vehicle user charging behavior could contribute to stabilizing the power grid. None of the cited research has worked on integrating power generation, power storage, and power consumption.

The coverage of this work is more versatile than the cited research. Moreover, all collected data, system facilities, and approaches are beneficial for future demand response plans based on distributed virtual power plants. Good utilization of accumulated raw big data would make this system valuable and in line with the future smart grid vision.

Current limitations and future work:


#### **5. Conclusions**

Kinmen is a resource-limited island with good solar and wind energy potential. The low-carbon trend is a must-do item to fulfill responsibility as a world member. Smart grid and low-carbon requirements could simultaneously move forward well with the support of a well-designed information system. Technically, only a system that can dynamically adjust demand response balance could make a smart grid possible.

In this work, a comprehensive monitoring and data collection system is well developed and deployed with versatile technologies corresponding to different environments and service requirements. With the redundancy deployment on a hybrid of on-premises and cloud systems, this robust, reliable, workable, and suitable IoT-based PV monitoring system specific to Kinmen is a practical approach to achieving zero-carbon and smart grid visions. Users can remotely access visualized data through the developed user-friendly web browser and Line bot. This implemented system collected and archived real-time data in terms of power generation, power storage, and power consumption since 2015 with IoT subsystem support to monitor site status and electricity usage of each site. The established dataset is essential for future power generation and consumption research in the Kinmen area.

The proposed system is on the way to integrating dataflow of distributed energy generation and storage, charging stations, and home electricity usage via IoT to make Kinmen a benchmark city with the smart grid.

**Author Contributions:** Conceptualization, H.-C.L.; Software, T.-C.L. and C.-Y.L.; Formal analysis, H.-C.L.; Investigation, H.-Y.L.; Resources, H.-Y.L.; Writing—original draft, H.-C.L. and T.-C.L.; Writing—review & editing, H.-C.L. and C.-Y.L.; Visualization, C.-Y.L.; Supervision, H.-C.L.; Project administration, H.-C.L. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data presented in this study are available on request from the corresponding author.

**Acknowledgments:** The authors sincerely acknowledge the significant support of HA system setup and C#, Python, and Web programs by Ming-Chih Liao, Tzu-Hao Lin, and Chung-En Chang from the Department of Computer Science and Information Engineering, National Quemoy University.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**



#### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

### *Article* **High-Performance Breaking and Intelligent of Miniature Circuit Breakers**

**Jianning Yin 1,2,\*, Xiaojian Lang 1, Haotian Xu <sup>1</sup> and Jiandong Duan <sup>1</sup>**


**Abstract:** The exploitation and utilization of clean energy such as wind and photovoltaic power plays an important role in the reduction in carbon emissions to achieve the goal of "emission peak and carbon neutral", but such a quantity of clean energy accessing the electric system will foster the transition of the electric power system structure. The intelligentization of power equipment will be an inevitable trend of development. High breaking performance, remote control and a digital detection platform of miniature circuit breaker, a protective equipment of a power distribution system, have also been inevitable requirements of the power Iot system. Based on the above, this paper studies three aspects: high-performance AC and DC general switching technology, remote control technology and operation status' digital monitoring. A new DC non-polar breaking technology is proposed, which improves the short circuit breaking ability. An experimental prototype using the above techniques was fabricated and passed the DC 1000 V/10 kA short-circuit breaking test. On the basis of the above, an intelligent circuit breaker is developed, which contains multiple functions: remote switching, real-time temperature detection, energy metering and fault warning. Moreover, a software for digital condition monitoring and remote control is developed. This work has certain theoretical and practical significance for the development of the power Internet of things.

**Keywords:** DC interrupting; digitization; remote control; electric energy measurement; miniature circuit breaker

### **1. Introduction**

Today, the regulation of the global pollution has been urgent, especially massive quantities of carbon dioxide emissions which lead to a more serious greenhouse effect and higher sea level that are threatening the global environment. Therefore, it has been a Global Common Mission to reduce the carbon emissions. China has also put forward the long-term vision of "emission peak and carbon neutral", and the raising of the goal will necessitate the exploitation and utilization of clean energy. Such substantial quantities of distributed clean energy accessing the electric system will foster the transition of the power system framework and equipment. Aiming to adequately absorb clean energy, it has been a trend of electric system reform to construct a new power system taking clean energy as the main body, and the promised power system will be more intelligent, shared and controllable [1,2]. At the same time, smart grid and distributed power technology will also boom. The digitalization and intellectualization of the core power equipment among the new power system will hoist visualization, regulation capacity and promote consumption utilization level of new energy generation, so as to speed up the transformation from power grid to energy internet; electrical equipment's intellectualization and digitalization endow its status with visualization, so as to support a more flexible and moderate energy rationing platform [3,4]. Digitalization of power distribution allows facility managers and maintainers to efficiently solve problems with less energy, reducing operating and maintenance expenses [5].

**Citation:** Yin, J.; Lang, X.; Xu, H.; Duan, J. High-Performance Breaking and Intelligent of Miniature Circuit Breakers. *Sensors* **2022**, *22*, 5990. https://doi.org/10.3390/s22165990

Academic Editors: Antonio Cano-Ortega and Francisco Sánchez-Sutil

Received: 11 July 2022 Accepted: 9 August 2022 Published: 11 August 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

Acting as the core equipment of the power distribution system's terminal protection and regulation, MCB (miniature circuit breaker)'s intellectual trend promotes the achievement of power distribution digitalization [6]. At the same time, the exploitation and utilization of new energy generation, especially the boom of photovoltaic power, raises higher requirements to the DC current breaking capacity of MCB. With the development of AC–DC hybrid distribution network, high voltage (1000 V) and AC/DC universal MCB faces a great challenges.

The research on the breaking ability and the invention of new products concerning MCB have been going on for many years. A large number of scholars have conducted thorough research on its arc extinguishing ability [7,8]. Most of the research focuses on arc characteristics. There are two main schemes to study the arc characteristics of the circuit breaker: simulation and experiment [9–14]. For simulation, magneto-hydrodynamics (MHD) has become an effective auxiliary mean to study characteristics of arc's motion and vanishing [15–18]; for experiment, the arc's diagnostic means mainly include optical fiber testing, high-speed photography and laser light filling. The dynamic characteristics and breaking performance of arc via recording the arc motion form are studied [19–21]. Most of the DC MCB products on the market are below 1000 V, and the breaking capacity is insufficient (6 kA) and unstable. Regarding remote control and digitalization, as the technology of the Internet of things experiences continuous development in recent years, remote monitoring technology based on WIFI/4G is proposed [22]. Some industry conducted preliminary studies on the intelligent circuit breaker and developed some of those products. However, there are few circuit breakers that can realize electric energy measurement and real-time online monitoring of operating status [23,24]. Therefore, it is of great theoretical and practical significance to study the remote control, digitalization and AC/DC general high-performance breaking technology. Based on this, the DC non-polar breaking technology of MCB is studied firstly, and an arc extinguishing strategy of coordinated control of magnetic blowing and air blowing is proposed. An experimental prototype using the above techniques was fabricated and passed the DC 1000 V/10 kA short-circuit breaking test. Secondly, the intelligent technology of MCB is studied. The integrating electric operation mechanism to realize remote control, visualization system, digital monitoring platform and mobile application (APP) based on a cloud platform are developed. The remote control, temperature monitoring, power management, automatic alarm and real-time monitoring of MCB are realized. The safe and efficient operation of load and terminal network is enhanced. The above research contributes to the development of a new power system via providing theoretical and technical reference for the improvement and optimization of system protection appliances after the high proportion of new energy access.

This paper is organized as follows: Section 2 presents high-performance (1000 V/10 kA) DC/AC breaking technology, and the principle and experimental results are introduced. Section 3 presents intelligent technologies of MCB, and the prototype realization principle and product performance are introduced. Section 4 is the conclusion.

#### **2. High-Performance DC/AC Breaking Technology**

For the breaking characteristics of a micro-circuit breaker, whether the circuit breaker can be successfully broken is directly determined by whether the air arc can be extinguished smoothly and quickly. Because of the inexistence of natural zero point, a DC arc is more difficult to put out compared with that of AC, and its breaking ability cannot be improved via increasing the distances and quantities of open gate pieces or any other conventional technologies due to the limitation of small volume of MCB; furthermore, single arc extinguishing measures have been unable to meet the higher voltage (1000 V) of the DC circuit breaker open performance requirements. At the same time, the requirements of power system distribution equipment hope the DC circuit breaker can realize the non-polarity breaking, so the structure design of its arc extinguishing chamber is facing tougher challenges.

In order to enhance the energy dissipation of the arc, this paper puts forward an arc extinguishing strategy coordinated by air blowing and magnetic blowing, so as to realize the purpose of breaking large DC current with a small volume by increasing the effect of air blowing and magnetic blowing in the arc extinguishing chamber.

#### *2.1. Theoretical Analysis and Practical Scheme of DC Non-Polar Arc Extinguishing*

The traditional arc extinguishing strategy has been difficult to match the high voltage DC interruption. Therefore, an arc extinguishing scheme with coordinated control of magnetic blowing and air blowing is proposed. Specifically, the permanent magnets and gas-producing materials are added to the arc extinguishing chamber: on the one hand, permanent magnet is used to enhance the magnetic blowing effect; on the other hand, gas generation material is used to enhance the air blowing effect. The overall layout scheme of the arc extinguishing chamber is shown in Figure 1. Permanent magnets are placed on both sides of the contact and arc running area, respectively, and the gas-producing material is wrapped on the outside of the permanent magnet. On the one hand, the air blowing can be enhanced, and on the other hand, the permanent magnet can be prevented from losing magnetism due to direct contact with the high-temperature arc. Due to the polarity of permanent magnets, it is a necessity to arrange the position of permanent magnets reasonably in order to realize the non-polar breaking of DC arc. The schematic diagram of non-polar permanent magnet layout scheme is shown in Figure 2. Figure 2 shows the schematic diagram of the arrangement of two permanent magnets' S poles opposite each other. In order to realize the DC non-polar breaking, the permanent magnets are arranged on both sides of the arc extinguishing chamber wall, respectively, with the same magnetic poles being opposite with each other.

**Figure 1.** Layout scheme of arc extinguishing chamber.

**Figure 2.** Arrangement schematic diagram of permanent magnet. (**a**) The current goes into the paper. (**b**) The current goes out of the paper.

The extinguishing of the DC arc mainly depends on the current limiting of the arc voltage, which forces the current to cross zero to extinguish. Therefore, increasing the arc voltage is the fundamental measure for arc extinguishing. Increasing the arc voltage mainly depends on the splitter plate cutting the arc, forming multiple near-pole voltage drops. Therefore, the main measures in the arc extinguishing design are to make the arc enter the splitter plate area quickly. For DC non-polar breaking, that is, after the current direction is changed, it does not affect the smooth entry of the arc into the arc extinguishing chamber.

Figure 2a shows the direction of the Loren magnetic force in the arc columns at eight different locations when the current direction is straight into the page. It can be seen from the figure that when arc column is located in the point 2 or 5, the Loren magnetic force moves arc to left and up direction; when arc column is located in the point 3 or 8, the Loren magnetic force moves arc to right and up direction, which is of benefit for the arc being blown into the grid, cooled and cut so as to improve the arc voltage.

When the arc column is located at point 1, it will move to the point 2 under the action of Loren magnetic force, and the changed Loren magnetic force will move arc to right and up direction and push it into grid region; when the arc column is located at point 6, it will move to the point 3 and receive the force to the upper right of the grid; when the arc column is located at point 7, it will move to the point 8, and the Loren magnetic force will move arc to right and up direction and push it into grid region. When the arc column is located at point 4, on the one hand, Loren magnetic force will move the arc toward the contact area; on the other hand, the presence of gas producing materials will force arc toward the grid area, and, due to the complex interaction, the arc will eventually move toward the grid area.

From the above analysis, it can be seen that the arc can enter the grid area and be cut quickly no matter how the direction of current place. At the same time, the permanent magnet arrangement scheme shown in Figure 2 can reduce the pinch force of the magnetic field generated by the arc itself and weaken the hindering effect of the magnetic field on the arc column movement, so as to further hasten the speed of that entering the grid area, promote the rapid rise of the arc voltage and improve the DC arc breaking ability.

#### *2.2. DC Breaking Test and Result Analysis*

(1) Test prototype and conditions.

Based on the above theory, a MCB test prototype is made, as shown in Figure 3: The permanent magnet is arranged on both sides of the contact and arc running area and wrapped by the gas-producing material. In order to verify the breaking ability of this scheme, the test was carried out under the conditions of 1000 V DC, 10 kA short-circuit current and 5 ms time constant. The test was carried out in the standard circuit breaker test station.

**Figure 3.** Experimental prototype.

(2) Analysis of test results.

In order to verify the non-polarity breaking capacity of this scheme, a prototype of forward connection and reverse connection was tested in the short-circuit experiment. According to the short-circuit breaking capacity test standard of circuit breakers, an o (open)-co (close-open) standard process needs to be completed under short-circuit current. That is to say, one experiment is closed before power-on and opened directly after power-on, and the second time is closed and then opened after power-on. DC breaking waveform (including the arc current and voltage curves) is shown in Figure 4.

**Figure 4.** DC breaking test waveform (third party test). (**a**) The waveform of forward connection. (**b**) The waveform of reverse connection.

It can be seen from the arc voltage waveform that the arc voltage rises rapidly, and the highest voltage is more than 1800 V, which exceeds the system voltage 1000 V to a large extent. At the same time, the arcing time is about 5 ms. This is mainly because the arc quickly enters the splitter plate area and is cut by the splitter plate under the combined action of magnetic blowing and air blowing. At the same time the dissipation of arc energy is enhanced. Under the joint action, the arc voltage rises rapidly, and the current limiting effect is obvious. Therefore, the arc current quickly crosses zero and extinguishes, shortening the arcing time.

As it can be seen from the above test waveform, whether it is a forward or reverse connection, the circuit breaker prototype has successfully broken the short circuit current of 10 kA, and shortened the arc burning time, which fully verifies that the above permanent magnet layout scheme can achieve non-polar breaking and improve the breaking capacity compared with that of the market conventional circuit breaker (6 kA). The results show that the magnetic blowing and air blowing coordinated control strategy is effective in improving the breaking capacity.

The coordinated control strategy of arcing above does not need to change the size of the original circuit breaker and the main structure but only needs to place the permanent magnets and gas material on both sides of contact and arc running area; therefore, the scheme can not only be used for the development of high-performance DC circuit breaker but can also be applied to existing AC–DC optimal design of the miniature circuit breaker directly, in order to enhance the breaking capacity of short-circuit current.

#### **3. Intellectualization of MCB**

With the development of Internet of things technology, intelligent and digital requirements are put forward for the MCB used in the distribution system terminal. In order to realize the remote opening and closing control and online status monitoring of the MCB, the hardware and software systems are researched and developed.

#### *3.1. Intelligent Platform Architecture*

The intelligent platform architecture of MCB is shown in Figure 5. The APP is an abbreviation of "mobile phone application". The remote control is mainly based on the cloud platform, and the communication between the cloud platform and the circuit breaker is realized through the gateway. The gateway is connected with the circuit breaker module by a Type C data cable and can be configured with WIFI and data networks. At the same time, the development of a web version and a portable mobile phone APP realize the digital monitoring of the circuit breaker running state, remote opening and closing through the operation of the APP, power managing, temperature monitoring, overtemperature alarm, automatic trip and other multiple functions.

**Figure 5.** Intelligent overall architecture of circuit breaker.

In order to achieve remote control, it is necessary to configure the circuit breaker hardware as follows: through the operating mechanism to realize the opening and closing operation, through the voltage and current sensor to realize the data acquisition and power management and through the temperature sensor to realize the real-time monitoring of the temperature of the circuit breaker.

#### *3.2. Intelligent and Digital Circuit Breaker System*

The whole intelligent circuit breaker is shown in Figure 6, which mainly consists of three modules: power module, gateway and circuit breaker module. The three modules are connected by a Type C data cable. The input of the power module is AC 220 V, and the output is DC 12 V, whose main function is to supply power to the gateway and single chip operating mechanism of the circuit breaker module. The gateway plays the role of network communication; the Type C cable not only provides power but also implements communication for the gateway and the circuit breaker modules.

**Figure 6.** Composition of intelligent MCB.

Compared with the conventional circuit breaker, the new intelligent micro-circuit breaker products share the circuit breaker module plus a pole, used to install operating mechanism, control board, sensors and other devices to build a digital monitoring platform. The hardware system mainly includes a data acquisition system, central processor, actuator, display unit and circuit breaker. The hardware system diagram is shown in Figure 7.

**Figure 7.** Diagram of hardware system.

(1) Remote control opening and closing technology.

The "conditioning circuit" mainly filters the arc voltage and current signals. The MCU is the Single Chip Microcomputer MKE02Z. The main function is to process and display the collected data and issue opening and closing instructions to the motor control chip.

In order to realize the software remote control circuit breaker opening and closing, the hardware system added a motor with a control chip. When switching remote control points, firstly choose circuit breaker which requires operation in the display interface or the phone APP interface, clicking on the corresponding button; then issue instructions to the chip which control the motor turning forward or reverse by the single chip micro-computer; then drive the gears which link the circuit breaker handle with coaxial connection, so as to realize the remote points and closing operation. The entire operating mechanism and data acquisition hardware layout are shown in Figure 8.

**Figure 8.** Hardware of operating mechanism and data processing and acquisition.

(2) Data acquisition and digital display interface.

In order to realize the digital monitoring of running status of the circuit breaker, the hardware system uses an NXP single-chip micro-computer as the main control chip, installed the voltage sensor, current transformer and temperature sensor to complete the data collection of voltage, current and temperature. In order to realize the electric energy metering function, the voltage and current signals are sent into the electric energy metering chip through the conditioning circuit to complete the consumption calculation, then are sent into the digital display interface through the MCU main control chip to realize the digital real-time display of voltage, current and electric energy.

Moreover, the single-chip computer stores and processes data. When the circuit breaker's real-time temperature exceeds its rated temperature, the main control chip will emit alarm instructions, on the one hand displaying alarm information on popup windows in the system, on the other hand conveying fault signal to the circuit breaker failure indicator and control its flashing, so as to realize real-time fault alarm, facilitating maintenance personnel.

Based on the above principles, the corresponding monitoring system and APP were developed. The interface of monitoring system is shown in Figure 9. The number of circuit breakers and specific information in different states (include good, alarm, fault and offline) are displayed in the interface. At the same time, the fault recognition rate, push information method, etc. are also displayed on the interface. Opening and closing commands can be issued through the circuit breaker monitoring interface and display the operating status of the circuit breaker. The temperature and power of each circuit breaker can also be viewed in real time through the interface.

(**b**)

Through this monitoring system, all circuit breaker layout points and operation data in the whole process can be digitally monitored. At the same time, the system can be used for a specific circuit breaker to achieve remote open and close operation, electric energy measurement, current and voltage monitoring, circuit breaker operating temperature display, real-time warning of overtemperature and so on. In the opening and closing operation, one circuit breaker can be operated alone, and multiple channels can be operated at the same time. Voice control is added in the mobile phone APP.

The digital control system above endows the power distribution system terminal protection equipment MCB with intelligent and digital monitoring and endows it with the functions of remote opening and closing, electric energy measurement, over-temperature alarm and so on, which facilitates the operation of users and improves the reliability of power equipment. At the same time, it can be seen that the current digital monitoring of the circuit breaker is still on a computer or mobile phone APP, with limited functions, requiring further research and development aiming digital control pane, so as to realize the integrated design of circuit breaker and digital monitoring platform, providing theoretical foundation for research of new generations of digital circuit breakers.

In combination with the high-performance arc open-circuit technology and intelligent technology mentioned above, the prototype was made and successfully passed through the third party test, providing technical reference for intellectualization and digitalization of power equipment in China, providing reference for the research and development of a highperformance DC intelligent circuit breaker for photovoltaic power generation, intelligent park and energy storage system.

#### **4. Conclusions**


**Author Contributions:** Data curation, J.Y.; Formal analysis, X.L.; Funding acquisition, J.Y.; Investigation, X.L. and H.X.; Project administration, J.Y.; Software, X.L. and J.D.; Visualization, J.D.; Writing—original draft, J.Y. and H.X.; Writing—review and editing, H.X. and J.D. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work is supported by the National Natural Science Foundation of China (NSFC) (52107167), China Postdoctoral Science Foundation (2021M692877), Basic Research Program of Natural Science in Shaanxi Province (2021JQ-473), Scientific Research Projects of Education Department of Shaanxi Provincial Government (21JK0788), Research Fund of Xi'an University and Technology (104–451119032).

**Acknowledgments:** This work was completed with the support of several funds. Fund information is listed as the following. Besides, many thanks to Ge Shiwei for his support of this paper.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## **Knowledge-Based Sensors for Controlling A High-Concentration Photovoltaic Tracker**

**Joaquin Canada-Bago 1,\*, Jose-Angel Fernandez-Prieto 1, Manuel-Angel Gadeo-Martos <sup>1</sup> and Pedro Perez-Higueras <sup>2</sup>**


Received: 5 February 2020; Accepted: 26 February 2020; Published: 28 February 2020

**Abstract:** To reduce the cost of generated electrical energy, high-concentration photovoltaic systems have been proposed to reduce the amount of semiconductor material needed by concentrating sunlight using lenses and mirrors. Due to the concentration of energy, the use of tracker or pointing systems is necessary in order to obtain the desired amount of electrical energy. However, a high degree of inaccuracy and imprecision is observed in the real installation of concentration photovoltaic systems. The main objective of this work is to design a knowledge-based controller for a high-concentration photovoltaic system (HCPV) tracker. The methodology proposed consists of using fuzzy rule-based systems (FRBS) and to implement the controller in a real system by means of Internet of Things (IoT) technologies. FRBS have demonstrated correct adaptation to problems having a high degree of inaccuracy and uncertainty, and IoT technology allows use of constrained resource devices, cloud computer architecture, and a platform to store and monitor the data obtained. As a result, two knowledge-based controllers are presented in this paper: the first based on a pointing device and the second based on the measure of the electrical current generated, which showed the best performance in the experiments carried out. New factors that increase imprecision and uncertainty in HCPV solar tracker installations are presented in the experiments carried out in the real installation.

**Keywords:** knowledge-based sensor; Internet of Things; high-concentration photovoltaic systems; sun tracker

#### **1. Introduction**

The European Commission has recently published the photovoltaic (PV) status report [1] in which PV market, electricity costs, and the economics of PV systems are analyzed. Within its conclusions, the following stand out: (a) the new installed capacity of solar PV power and the number and volume of PV markets are increasing; (b) a rapid decarbonization is necessary; (c) a rapid cost reduction exists in PV manufacturing; (d) different studies about subsidies for combustibles, fuels, and electricity have been presented; (e) solar energy will continue to grow at high rates; and (f) electricity from PV systems could be cheaper than residential consumer prices in a wide range of countries.

To analyze the PV system profitability, it is convenient to take into account additional factors such as subsidies and forecasting of PV power generation. According to [1], while fossil fuel subsidies could indirectly increase noxious and greenhouse gases, renewable energies and energy efficient technologies subsidies may help to reduce emissions. A new scheme of subsidies based on the price of CO2 is presented in the literature [2]. A review of forecasting of PV generation is presented in the literature [3].

High-concentration photovoltaic systems (HCPVs) [4–7] concentrate the sunlight received between 300 and 2000 times onto photovoltaic cells by means of optical concentration devices. The main objective of these systems is to replace semiconductor materials (photovoltaic cells) with more economical optical materials (lenses and mirrors), reducing the cost of power plants and generated energy.

Although HCPV is a young technology, it has already demonstrated a great capacity for growth in recent years. In this sense, the number of companies that develop HCPV systems has grown rapidly, and the installed power has gone from a few kWs in laboratories to several megawatts.

According to [8] concentration photovoltaic (CPV) has potential for reducing the levelized cost of electricity. In this sense, if installations continue growing, CPV could reach a cost ranging between €0.045/kWh and €0.075/kWh. The system prices, including installation for CPV power plants, would then be between €700 and €1100/kWp. On the other hand, HCPV could be competitive in some locations in 2020 [9].

Due to the concentration of energy, tracker or pointing systems [10] are necessary in CPV and HCPV systems, which represents one of the differences with respect to conventional photovoltaic (PV) systems [11,12]. In these systems, power generation decreases dramatically with a sun pointing error greater than 0.5◦, becoming practically zero if the error exceeds even a few degrees.

Frequently, a high degree of inaccuracy and uncertainty or imprecision are observed in HCPV tracker installations due to factors such as HCPV module manufacturing errors, module alignment errors, and the precision and accuracy of the tracker control system [13].

Fuzzy rule-based systems (FRBSs) [14] have demonstrated correct adaptation to problems having a high degree of inaccuracy and uncertainty. Based on fuzzy logic (FL) [15], these systems express knowledge by means of a set of linguistic rules grouped in a knowledge base (KB). FRBSs can be used in control systems, e.g., fuzzy logic controllers (FLCs), in which the control algorithm is expressed as a set of actuation linguistic rules.

Currently, there is a persistent trend to integrate knowledge-based systems (e.g., FRBSs) and FLCs into resource-constrained devices and into the paradigm of the Internet of Things (IoT) [16].

The IoT concept was introduced by Kevin Ashton in 1999, in which the physical world is connected to the Internet through ubiquitous sensors [17,18]. The IoT refers to the use of constrained resource devices, data acquisition, actuation, data communication with fog and cloud servers, data storage, and subsequent analysis. The range of applications of the IoT is very wide and includes environmental monitoring systems, fire detection, intelligent buildings, smart cities (traffic, lighting, parking location, garbage containers, etc.), intelligent agriculture, industrial control and monitoring, logistics, health monitoring, etc.

The main objective of this work is to design a knowledge-based controller for an HCPV tracker and to implement it in a real system by means of IoT technology (i.e., constrained resource devices, data communication, and a cloud computing server for data storage and analysis).

The remainder of this paper is organized as follows. The following section shows related work. Section 3 addresses the proposed controller and knowledge-based FRBS sensors. Section 4 presents the real HCPV tracker, the experiment that was carried out, and the results obtained. Finally, conclusions and future work are presented in Section 5.

#### **2. Related Work and Background**

To achieve the objective, this work proposes to use FRBS due to the high degree of inaccuracy and uncertainty presented in real installations of PV trackers and IoT technologies to integrate the tracker controller into a resource-constrained device and monitor the data obtained. The following sections show the related work about PV trackers, and an introduction to FRBS and IoT technologies.

#### *2.1. PV Trackers*

To point to the sun, PV installations use solar tracking systems or trackers that are composed of a metal structure that may be moved (whether on a dual or single axis) using motors, PV modules, position sensors, and a control system.

A review of different conventional PV tracker systems was presented in the literature [19], in which they are classified as active or passive control systems. The most common are active control systems, which may be differentiated into five types:


The main objective of controllers is to generate the maximum energy stabilizing the PV system in the maximum power point (MPP) by means of the maximum power point tracking (MPPT) technique [28,29] that is widely used in PV systems. A MPPT tracking based on learning is presented in [30]. To verify the proper operation of PV systems it is necessary to monitor the evolution of the significant magnitudes involved in the system [31–33].

Despite the wide interest in PV tracker controllers, little attention has been paid to tracker controllers for HCPV systems.

When HCPV systems are used, the sun pointing error has a maximum admitted value. The electrical current generation surface presents a maximum if azimuth and elevation errors are zero. If these errors are greater, the electrical current generated by the module decreases dramatically [34]. For example, the HCPV modules used in the experiments in this work require azimuth and elevation errors lower than ± 0.6◦.

Although the algorithms and pointing devices are able to calculate the solar position with sufficient accuracy for HCPV systems, there is a high degree of inaccuracy and uncertainty or imprecision in HCPV tracker installations due to multiple factors. According to [13], there are three factors that cause system mismatches and power losses: manufacture error in HCPV modules, alignment error in the installation of the modules, and imprecision and inaccuracy in the tracker control system.

Due to these errors, the maximum power generation does not coincide with the zero error pointing of the tracker [35] in the installation of HCPV modules; therefore, controllers based on pointing algorithms (e.g., ephemeris) and pointing devices that minimize pointing error may present unacceptable errors in HCPV systems. However, the precision used in CPV installations is lower than that required in HCPV systems.

As a consequence, current trackers are not properly adapted to HCPV systems. These systems require more complex controls in order to obtain the maximum energy. Due to the optical concentration, the complexity of these systems, and the high degree of inaccuracy and uncertainty observed in the installation of these systems, a greater degree of precision and complexity in the control of tracker systems is necessary.

#### *2.2. Fuzzy Rule-Based Systems*

A technology that has been demonstrated to adapt correctly to environments with inaccuracy and uncertainty is the FRBS [8], which uses FL and expresses knowledge through IF-THEN-type linguistic rules. These systems (Figure 1) are composed of a fuzzification interface, a KB, an inference engine, and defuzzification interface. The fuzzification interface adapts the actual input values to the fuzzy system. The KB contains the definition of input and output variables, the fuzzy sets defined in the variables, and a set of IF-THEN-type linguistic rules that corelate these variables. The inference engine is responsible for inferring the fuzzy output of the system from the input variables and the KB. Finally, the defuzzification interface adapts the value of the fuzzy output to a real output value.

**Figure 1.** Fuzzy rule-based system.

Two approaches have been proposed within FRBSs: those of Mamdani [36,37] and Takagi– Sugeno–Kang (TSK) [38]. The main difference between the two approaches lies in the consequent knowledge rules. In the Mandani approach, the consequent is expressed as a linguistic variable:

IF X1 is A1 and . . . and Xn is An THEN Y is B,

where Xi represents input variables, Ai is fuzzy sets associated with input variables, Y is the output variable, and B is a fuzzy set associated with the output variable.

In the TSK approach, the consequent is an analytical function of the input variables:

IF X1 is A1 and . . . and Xn is An THEN Y = f (X1,...,Xn).

where Xi represents input variables, Ai is fuzzy sets associated with input variables, Y is the output variable, and f (X1,...,Xn) is the output function, in most cases, a linear function.

Controllers that use FRBS systems incorporating control knowledge are called FLCs (Figure 1).

In the literature [14], several FRBS applications are presented, such as classification systems, modeling systems, control systems, and robotics. A model of an HCPV module is presented using an FRBS system [39].

Currently, there is a trend to integrate knowledge-based systems into resource-constrained devices. Reference [40] presents a collaborative FRBS system for integration into wireless sensor networks (WSNs). In the literature [41], an optimization for smart spaces is proposed. Mariscal-Ramirez et al. [42] designed a sensor to monitor noise pollution adapted to resource-constrained devices.

#### *2.3. Internet of Things*

One of the objectives of this work is the use of IoT technologies to integrate the controller of an HCPV tracker into a resource-constrained device and monitor the data obtained using an IoT cloud platform.

Basically, IoT technologies [17,43] consist of constrained resource devices, data networks, communication protocols, and cloud platforms as follows:


The use of IoT in smart spaces [18] and smart devices is widely referenced in European Commission documents [47,48] concerning the Internet of Things and the Internet of the Future. These documents present devices called smart things in which several algorithms can be executed for intelligent decisions based on real-time measurements of the sensors.

#### **3. HCPV Tracker Knowledge-Based Controller**

The main objective of this work is to design a knowledge-based controller for an HCPV tracker. Due to the uncertainty and inaccuracy of the positioning or tracker systems, this work proposes the use of FRBS systems because these knowledge-based systems have demonstrated their effectiveness in these conditions.

In addition, the proposed system will be implemented in a real system using IoT technology with constrained resource devices, data communication, and a platform for storing and analyzing the data obtained. Therefore, all the algorithms used in the control system will be designed to be executed in a low-cost microcontroller with low information processing capacity.

Therefore, the novelty of the proposed controller lies in (a) the design of a knowledge-based controller by means of FRBS, (b) easy to understand control knowledge, and (c) a design to be executed on a resource-constrained device.

The following sections show the structure of the proposed controller and two knowledge-based FRBS sensors: the first uses a positioning device, and the second is based on the electrical current generated by the photovoltaic concentration modules.

#### *3.1. Controller Structure*

Figure 2 shows the basic structure of the control system, which is composed of a pointing system, an error inference system, and a solar tracking system. In this way, the positioning of the tracker will be calculated as the sum of the positioning algorithm and the error inferred by the FRBS systems.

**Figure 2.** Controller structure.

The calculation of the solar position can be performed by different algorithms (such as ephemeris) or by a solar position sensor. If an ephemeris algorithm is used, it calculates the position of the sun using the date, time, and global position of the solar tracker. After that step, the controller compares the position of the sun with the position to which the tracker is pointing and calculates the azimuth and elevation angles that the solar tracker has to perform in the next movement.

To calculate the pointing error, a knowledge-based FRBS is used. To execute the FRBS in a constrained resource device, we introduced several modifications to the classical structure of Mandani FRBS to minimize computational burden: the device executes a small but complete FRBS; only triangular fuzzy sets are available; fuzzification and defuzzification interfaces only admit linear conversions; a First Infer Then Aggregate (FITA) inference approach is used; the inference engine operates with numerical values instead of linguistic labels; and the number of fuzzy sets defined in each variable and rules in the KB is small.

This work presents two knowledge-based FRBSs to infer the pointing error. The first is a smart sensor that is composed of a pointing device and an FRBS that infers the error. The second one is a different smart sensor composed of a probe that obtains the electric current generated by the HCPV module and another FRBS to infer the error.

#### *3.2. FRBS Sensor Based on a Pointing Device*

This smart sensor is composed of a hardware pointing device that allows measuring luminosity by means of four photoresistors (PRs) and an FRBS that infers the pointing error using a specific KB.

The pointing device is composed of an optical light/shadow device, a set of four PRs, and a signal adaptation stage. In this way, the luminosity values measured by the PRs are the inputs to the FRBS that infer the error. If the sensor is pointed correctly to the sun, each of the PRs has the same solar radiation. In the case of a pointing error, the optical device increases the radiation difference that some PRs receive compared to others to detect small pointing errors.

Figures 3 and 4 show the pointing device used by the sensor. The difference between the luminosity measured by PR1 and PR2 allows the sensor to infer the elevation error. Similarly, it can be estimated using PR4 and PR3, which allows the sensor to infer the error with two different systems. On the other hand, the difference in luminosity measured by PR4 and PR1 (as well as between PR3 and PR2) allows the sensor to infer the azimuth error.

**Figure 3.** Pointing device. Top view.

**Figure 4.** Pointing device.

This work proposes to use two KBs, one to infer the elevation error and another to infer the azimuth error. Since KBs will be executed in a constrained resource system, a small number of fuzzy sets defined in variables and action rules will be used to infer the error as quickly as possible.

Elevation error KB consists of two input variables (luminosity measured in two PRs), one output variable (elevation error), and a set of action rules. Figure 5 shows the fuzzy sets defined for all input variables (PRs) and the output variable (elevation error).

**Figure 5.** Fuzzy sets defined in input (PRs) and output variables (elevation error).

Table 1 shows the KB elevation error action rules for the sensor using PR1 and PR2.


**Table 1.** Action rules for the elevation error (PR1 and PR2).

Table 2 shows the KB elevation error action rules for the sensor using PR4 and PR3.

**Table 2.** Action rules for the elevation error (PR4 and PR3).


The azimuth error KB uses the same definition of input and output variables as the elevation error KB (Figure 5). However, the set of action rules is different. Table 3 shows the KB azimuth error action rules for the sensor using PR4 and PR1.

**Table 3.** Action rules for the azimuth error (PR4 and PR1).


Table 4 shows the KB elevation error action rules for the sensor using PR3 and PR2.


**Table 4.** Action rules for the azimuth error (PR3 and PR2).

The fuzzification interface linearly converts the range [0, 1024] measured by the PRs to the normalized range [0, 1] and the output range [0, 1] to [−3◦, 3◦].

#### *3.3. FRBS Sensor Based on the Electrical Current Generated*

This sensor is composed of a hardware probe that measures the electrical current generated by HCPV modules and an FRBS, which infers the elevation and azimuth errors.

The controller uses the usual sun tracking movements to measure the electrical current generation before and after each movement. After each movement, these two measurements are used to infer the elevation and azimuth errors. The inferred errors are taken into account in the next tracker movement to correct the errors and follow the maximum electrical current generated. Therefore, no extraordinary movements are made.

The KB of this FRBS system is used to infer both the elevation and azimuth errors and is composed of two input variables (electrical current before and after the movement), an output variable (error committed), and a set of action rules. Figure 6 shows the fuzzy sets defined in input variables (It and It <sup>+</sup> 1) and output variable (Error) (with an HCPV module with Imax 6A and a maximum error of ± 3◦).

**Figure 6.** Fuzzy sets defined in input (It and It <sup>+</sup> 1) and output variables (error).

Table 5 shows the KB error action rules used by the sensor.

**Table 5.** Action rules in knowledge base (KB) error.


The fuzzification interface linearly converts the range [0, 6] measured by the electrical current probe to the normalized range [0, 1] and the output range [0, 1] to [−3◦, 3◦].

#### **4. Experimental Results**

To evaluate the controller and FRBS sensors proposed in the previous section, a real two-axis tracker with HCPV modules controlled by a low-cost microcontroller was designed and implemented. In addition, elevation and azimuth errors of tracker pointing with respect to the sun are measured by a precision instrument. On the other hand, data obtained (most significant tracker variables and elevation and azimuth errors) were sent to an IoT platform in order to analyze their evolution and compare the results of the different FRBS systems proposed.

This section describes the HCPV tracker used in the experiments carried out and the results obtained with the following controllers based on (a) an ephemeris algorithm only; (b) an FRBS sensor based on a pointing device; and (c) an FRBS sensor based on the electrical current generated.

#### *4.1. HCPV Tracker*

The two-axis solar tracker (Figures 7 and 8) is composed of a metal structure with the possibility of movement in elevation and azimuth by means of gearboxes, various HCVPV modules, a calibrated solar cell, DC azimuth and elevation motors, a measurement system of the angular movement of each motor (encoders), a pointing error sensor, electrical current-generated sensors, and the control system.

**Figure 7.** The designed high-concentration photovoltaic system (HCPV) tracker.

**Figure 8.** The designed high-concentration photovoltaic system (HCPV) tracker.

The control system (Figure 9) is based on a low-cost 32-bit microcontroller and several signal adaptation interfaces to the following inputs and outputs:


**Figure 9.** Controller.

The controller calculates the elevation and azimuth angles to be performed at each moment using the state of the system (date, time, position of the sun, position of the tracker, etc.). On the other hand, the error inferred by the FRBS system is added to the calculated angles. The angular movements of elevation or azimuth of the tracker are carried out by means of an algorithm that calculates the activation and braking time of the motor as well as a maximum safety time. The movement made at each angle is verified by means of the encoders.

To measure the real elevation and azimuth error of the tracker in the sun, a Black Photon Tracking Accuracy Sensor measuring instrument (Figure 10) is available. The instrument is able to measure elevation and azimuth errors in the range ± 1.2◦ with a resolution of 0.0005◦. The data obtained from the instrument allow us to check the correct tracker pointing and are not used in tracker control.

**Figure 10.** Tracking accuracy sensor.

Data generated by the system (sensor measurements, solar radiation, temperature, solar position, tracker position, electrical current generated, etc.) are sent to an Internet IoT cloud platform that stores the data and allows users to monitor the temporal evolution of all variables using a web browser.

The main characteristics of HCPV modules used in the tracker are the following (at 1000 W/m2, 25◦C, AM1.5D): short-circuit current 6.35A, open-circuit voltage 18.45 V, DC power 95 W, and needed pointing error < ± 0.6◦.

To characterize the HCPV module, a complete exploration was carried out by measuring the short-circuit electrical current generated by varying its position with respect to the sun in an angular sector of ± 3◦ in elevation and azimuth. Figure 11 shows the obtained surface where it is observed that the maximum electrical current generation is not at the 0◦ elevation and azimuth point. The maximum current (5.54 A) is at an elevation error of +0.2◦ and an azimuth error of −0.8◦. On the other hand, the surface shows that the current generated falls drastically with a small variation of the elevation and azimuth angles.

**Figure 11.** Electrical current generated in an elevation and azimuth sector of ±3◦.

#### *4.2. Controller based on an Ephemeris Algorithm*

The first part of the experiments carried out has the objective of measuring the effectiveness of an ephemeris algorithm applied to a real tracker in which different imprecision factors can exist according to the reasons stated in Section 2.

Figure 12 shows a simulation in which the position of the sun (angles of azimuth and elevation) is calculated by the ephemeris algorithm as well as the tracker pointing during a day without taking into account the elevation and azimuth errors. In the graph, it can be observed how the tracker would be pointing at the sun practically without error during the period of time in which the sun elevation was greater than 15◦. The rest of the day, the tracker would be in a resting position (45◦ elevation, 180◦ azimuth).

Figure 13 shows the results obtained when the real tracker is controlled with the ephemeris algorithm. Figure 13a shows the DNI (direct normal irradiance) of March 15, 2019. It is a sunny day with a maximum of 1000 w/m2. Figure 13b shows the Isc (short circuit current) obtained by the HCPV module. Figure 13c shows the evolution of elevation and azimuth errors (measured by the precision instrument) with respect to the sun. Finally, Figure 13d presents the Isc/DNI ratio, which represents a normalization that allows comparison of the currents generated on different days. Figure 13c,d shows the results from approximately 9:00 a.m. at 6:00 p.m. which corresponds to an elevation of the sun greater than 15◦.

**Figure 12.** Sun position and tracker pointing (with both elevation and azimuth angles).

**Figure 13.** Results of the ephemeris controller. March 15, 2019.

Figure 13c shows how at the beginning of the experiment (9 am), it starts with an error of 0◦ in elevation and azimuth errors and the way in which both errors are increasing and exceeding the values recommended by the manufacturer (elevation error). Due to these errors, the generated current decreases in the first part of the day, although it is stabilized at the end. As a result of the elevation and azimuth pointing errors, the concentration generator does not operate at maximum, and the current generated is much lower than expected.

#### *4.3. Controller Based on an FRBS Sensor with a Pointing Device*

To improve the performance of the photovoltaic generator and correct the evolution of elevation and azimuth errors, this work proposes the use of an economic sensor based on an FRBS system and a pointing device that infers the elevation and azimuth errors.

This controller is composed of an optical light/shadow pointing device, a set of four PRs, a signal adaptation stage and an FRBS system. In the case of a pointing error, the pointing device increases the difference in radiation received by some PRs compared to others to detect small pointing errors. The pointing sensor is based on the device shown in Figure 14, and the FRBS system is shown in Figure 5 and Tables 1–4.

**Figure 14.** Pointing device.

Each PR of the pointing sensor is able to measure the luminosity that they are receiving by means of modifying their electrical resistance. Through an adaptation stage, the microcontroller can measure a proportional voltage (by means of a 12-bit analog input) in such a way that a value of 1,023 is obtained in the maximum luminosity and 0 in the dark.

Figure 15 shows the evolution of the luminosity value obtained in the four PRs. The data in the figure have been obtained with the tracker stopped while pointing to the sun with an approximate error of 0◦ in elevation and azimuth at a certain time (2:00 p.m). In this way, the evolution of the luminosity measured can be observed during a full sunny day.

**Figure 15.** Evolution of the luminosity value obtained in the four PRs.

Figure 16 shows the error inferred by the FRBS system between 13:00 and 15:00. In the initial and final parts, the sensor infers a constant error: a positive degree of error in elevation and azimuth in the initial part and a positive degree in elevation and a negative degree in azimuth. In the central part, the figure shows the inferred error when the device is pointing to the sun, inferring that the sensor has not been correctly oriented (minimum inferred error +0.5◦ elevation, 0◦ azimuth).

**Figure 16.** Elevation and azimuth errors inferred by the FRBS.

Figure 17 shows the results obtained when the tracker is controlled by the ephemeris algorithm modified with the FRBS sensor with the pointing device. Figure 17a shows the DNI of March 12, 2019. It is a sunny day with a maximum of 850 w/m2. Figure 17b shows the Isc current obtained by the HCPV module. Figure 17c shows the evolution of elevation and azimuth errors (measured by the precision instrument) with respect to the sun. Finally, Figure 17d presents the Isc / DNI ratio. Figure 17c,d shows the results from approximately 9:00 a.m. at 6:00 p.m. which corresponds to a sun elevation greater than 15◦.

Elevation error (°) (red)

**Figure 17.** Results of the controller based on an FRBS sensor with the pointing device. March 12, 2019.

Figure 17 shows the following:


#### *4.4. Controller based on an FRBS Sensor based on the Electrical Current Generated*

Although the inference of the pointing error improves the electrical current generated by the HCPV module, it does not obtain the maximum current generated.

In this section, an FRBS based on the electrical current generated is proposed in order to modify the position of the tracker obtained by the ephemeris algorithm. The controller proposed is based on the calculation of the pointing error by means of the FRBS system described in Section 3.3. In this controller, each time the tracker moves in elevation or azimuth (by means of ephemeris), the pointing error is inferred. The error is corrected in the next movement so that no extraordinary movements are made exclusively to correct the error.

Figure 18a shows the DNI of March 14, 2019. It was a sunny day with a maximum of 1000 w/m2. Figure 18b shows the Isc electrical current obtained by the HCPV module. Figure 18c shows the evolution of elevation and azimuth errors (measured by the precision instrument) with respect to the sun. Finally, Figure 18d presents the Isc / DNI ratio, which represents a normalization that allows comparison of the currents generated on different days. Figure 18c,d shows the results from approximately 9:00 a.m. to 6:00 p.m., which correspond to an elevation of the sun greater than 15◦.

Elevation error (ǚ) (red)

**Figure 18.** Results of the controller based on the FRBS sensor based on the electrical current generated. 14 March 2019.

Figure 18 shows the following:


The main benefits of the proposed controller are as follows:

• The controller showed the best performance, near the maximum current measured in the module characterization;


#### **5. Conclusions**

IoT technologies were able to execute HCPV controllers and monitor the evolution of variables in a satisfactory way. The constrained resource microcontroller executed the knowledge-based controllers with response times shorter than needed in the application. Although there was some timely loss of data due to the unavailability of communication on the Internet, the cloud computing architecture used in the project was more than sufficient. Data obtained in the system was correctly stored in the platform and monitored by users

Additional factors were presented to those provided in the literature [13], which increase imprecision and uncertainty in HCPV solar tracker installations. The factors presented in the project are the following:


The characterization of the HCPV module installed in the tracker verifies that the maximum energy may not be in the zero pointing error of the tracker due to the imprecision and uncertainty factors. In addition, to generate the maximum electrical current, it must be taken into account that a pointing error less than 0.6◦ is necessary.

The controller based exclusively on the ephemeris algorithm obtains very low performance due to the accumulation of azimuth and elevation errors. In this case, the tracker leveling error was important. When using this kind of controller, it would be necessary to calculate the error made and take it into account in the control algorithm.

The controller based on the FRBS sensor with a pointing device infers the azimuth and elevation error and increases the generated electrical current, improving the performance of the exclusively ephemeris-based controller. This controller requires that the maximum electrical current generation of the HCPV module installed in the tracker and the pointing device be perfectly calibrated (pointing to the same exact angle). In this case, a periodic calibration would be necessary.

The controller based on the FRBS sensor and an electrical current probe showed the best performance, obtaining values similar to those obtained in the module characterization. In this case, calibration is not necessary since the algorithm dynamically locates the maximum current generation.

Regarding future work, we propose the following actions: to use an IoT fog computing architecture in order to avoid punctual data loss; to characterize different HCPV modules in the real tracker; to compare other controllers; and to characterize HCPV systems composed of several modules in order to locate their maximum current generation.

**Author Contributions:** Conceptualization, J.C.-B. and P.P.-H.; Funding acquisition, J.C.-B. and P.P.-H.; Investigation, J.-A.F.-P. and M.-A.G.-M.; Methodology, J.C.-B., J.-A.F.-P., and M.-A.G.-M.; Project administration, J.C.-B. and P.P.-H.; Resources, J.-A.F.-P. and M.-A.G.-M.; Software, J.C.-B., and M.-A.G.-M.; Supervision, J.C.-B. and P.P.-H.; Validation, J.-A.F.-P. and M.-A.G.-M.; Visualization, J.C.-B., J.-A.F.-P. and M.-A.G.-M.; Writing—original draft preparation, J.C.-B.; Writing—review and editing, J.-A.F.-P., M.-A.G.-M.. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work forms part of the project "Nuevos conceptos basados en tecnología de concentración fotovoltaica: desarrollo de sistemas de muy alta concentración fotovoltaica" (ENE2013-45242-R) supported by the Spanish Economy Ministry and the European Regional Development Fund/Fondo Europeo de Desarrollo Regional (ERDF/FEDER).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **Machine Learning-Based Ensemble Classifiers for Anomaly Handling in Smart Home Energy Consumption Data**

**Purna Prakash Kasaraneni 1, Yellapragada Venkata Pavan Kumar 2, Ganesh Lakshmana Kumar Moganti 2,\* and Ramani Kannan <sup>3</sup>**

	- <sup>3</sup> Department of Electrical and Electronics Engineering, Universiti Teknologi Petronas (UTP), Seri Iskandar 32610, Malaysia

**Abstract:** Addressing data anomalies (e.g., garbage data, outliers, redundant data, and missing data) plays a vital role in performing accurate analytics (billing, forecasting, load profiling, etc.) on smart homes' energy consumption data. From the literature, it has been identified that the data imputation with machine learning (ML)-based single-classifier approaches are used to address data quality issues. However, these approaches are not effective to address the hidden issues of smart home energy consumption data due to the presence of a variety of anomalies. Hence, this paper proposes ML-based ensemble classifiers using random forest (RF), support vector machine (SVM), decision tree (DT), naive Bayes, K-nearest neighbor, and neural networks to handle all the possible anomalies in smart home energy consumption data. The proposed approach initially identifies all anomalies and removes them, and then imputes this removed/missing information. The entire implementation consists of four parts. Part 1 presents anomaly detection and removal, part 2 presents data imputation, part 3 presents single-classifier approaches, and part 4 presents ensemble classifiers approaches. To assess the classifiers' performance, various metrics, namely, accuracy, precision, recall/sensitivity, specificity, and F1 score are computed. From these metrics, it is identified that the ensemble classifier "RF+SVM+DT" has shown superior performance over the conventional single classifiers as well the other ensemble classifiers for anomaly handling.

**Keywords:** classification; data anomalies; data imputation; energy consumption data; ensemble classifiers; machine learning; smart home data; smart meter data; tracebase dataset

#### **1. Introduction**

Considering the global thrust towards the development of grid-independent and green energy systems for addressing the unrelenting growth of loads as well as environmental pollution, smart home and renewable energy-based microgrid culture has been increasing worldwide. Smart cities are new-era establishments where all the smart homes are jointly operated to consolidate and optimize electricity utilization. As these establishments are realized with a combination of electrical, communication, and information technology, the gathering of quality data is a challenging task. Smart homes connected to the power network continuously generate huge volumes of energy consumption data, which is normally a combination of timestamps and readings. The reading information in this data is a key value that helps in understanding the energy consumption behavior, billing generation, load profiling, forecasting, contingency analysis, device health condition analysis, etc. All these operations rely upon the quality of the data being captured. However, this data often may consist of different anomalies, viz., garbage data, outliers, redundant data, and missing data due to malfunctioning of advanced metering infrastructure, failure of communication channels, unanticipated issues in power networks, etc. If these anomalies are

**Citation:** Kasaraneni, P.P.; Venkata Pavan Kumar, Y.; Moganti, G.L.K.; Kannan, R. Machine Learning-Based Ensemble Classifiers for Anomaly Handling in Smart Home Energy Consumption Data. *Sensors* **2022**, *22*, 9323. https://doi.org/10.3390/ s22239323

Academic Editor: Davide Brunelli

Received: 3 November 2022 Accepted: 25 November 2022 Published: 30 November 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

left unhandled in the dataset, there will be an adverse effect on the system operations and further delude the analytics of the energy consumption data. So, handling these anomalies is highly essential to enable analysts to perform accurate energy data analytics. Thus, the multifaceted nature of the smart home data when compared to other datasets gains importance in the data analysis field. So, this becomes an important research focus for data analysts when compared to the datasets of other applications. This is the prime motivation for the proposed work of this paper.

Big data refers to a huge quantity of data. However, the data quality is a more complex and significant aspect than the quantity in the direction of research [1]. Moreover, the issues related to data quality have gained much importance and attention in energy big data analytics [2]. The increased use of several intelligent devices in power system applications has become the major source of big data, which reflects on data storage, data processing, and data quality [3–5]. The failure of these intelligent devices makes data incomplete during the acquisition of energy consumption data. This incomplete data is commonly referred to as missing data [6–8]. Handling rather than ignoring this missing data drives toward better data analytics on energy consumption [9]. Hence, it is essential to analyze and impute the missing data in the smart home energy consumption data. Following this, several state-of-the-art works on missing data imputation and ensemble methods are discussed as follows.

The researchers suggested several ML-based imputation methods as well as thorough benchmarks for the comparison between conventional and modern methods [10]. An imputation algorithm "opt.impute" was introduced in [11] to achieve the finest solutions to the missing data. Further, an extensive review was conducted on the imputation of missing data using ML which helps in understanding the limitations of ML imputation methods [12]. A framework was implemented to improvise the multivariate imputation by chained equations (MICE) in imputing the missing sensor data [13]. A graph-based method was discussed in [14] to impute the missing sensor data. A copy-paste imputation method was introduced in [15] to impute the time-series data of energy. A mixture factor analysis method was discussed to estimate the missing data in the building's energy load [16]. Different imputation methods, viz., MICE, KNN, and RF-based imputation were implemented to impute the missing data in the sensor data of the internet of things [17]. A data splitting-based imputation method named "nullify the missing values before the imputation" was proposed to impute the missing data [18].

A new statistical and ML-based imputation method was implemented in [19] to impute missing data in the applications of power grids. A fuzzy inductive reasoning method was discussed to deal with the missing data during the forecasting process in smart grids [20]. A six-stage particle swarm optimization imputation method was implemented for smart meter data collected from an Indian institution [21]. An imputation method based on a denoising autoencoder was presented in [22]. An imputation model named "bagged averaging of multiple linear regression" was discussed in [23] for imputing missing data in phasor measurement units. A two-stage deep autoencoder-based data imputation method was discussed in [24] for imputing missing data in wind farms. A bagging algorithm was implemented to impute the missing data in time-series data [25]. An autoencoder neural network was presented to impute missing data for classification [26]. The appropriate selection of the best imputation method and classification was discussed in [27]. An extensive study on the packages available in "R" for data imputation was presented in [28]. Electricity theft detection in smart grids using various ML algorithms and deep learning techniques was discussed in [29,30]. An AdaBoost ensemble model was implemented to detect electricity theft [31]. An improvised ensemble model of a general regression neural network and successive geometric transformations model was presented in [32] to recover the partial or fully missed data.

In summary, the abovementioned literature discusses the concepts of big data, sources of big data, and energy data analytics. In addition, the importance of handling anomalies in big data was discussed. To handle the anomalies in energy consumption datasets, a few imputation methods such as data splitting, fuzzy inductive reasoning, denoising autoencoder, and bagging are used. Further, to evaluate their performance, various single classifiers, namely, SVM, neural networks, etc., are used. However, these approaches are found ineffective to address the hidden issues of smart home energy consumption data due to the presence of a variety of anomalies such as garbage data, outlier data, redundant data, missing data, etc.

On the other hand, in recent days, the ensemble classification approach is supporting effective classification in data imputation in different applications, which was not tried for the smart home energy consumption data. With this motivation, this paper proposes ML-based ensemble classifiers to handle all the possible anomalies in smart home energy consumption data. The major contributions of this paper are summarized as follows:

	- Part 1 (anomaly detection and removal) considers the original dataset and refines it by removing all the identified anomalies.
	- Part 2 (data imputation) considers this refined dataset and performs the missing data imputation using median, KNN, and bagging imputation methods, thereby producing an anomaly-free dataset.
	- Part 3 (single-classifier approaches) performs the classification of the dataset using the conventional single-classifier approaches such as RF, SVM, DT, NB, KNN, and NNET.
	- Part 4 (ensemble classifiers approaches) performs the classification of the dataset using the proposed ensemble classifier approaches such as RF+SVM+DT, RF+SVM+NB, RF+SVM+KNN, RF+SVM+NNET, RF+DT+NB, RF+DT+KNN, RF+DT+NNET, RF+NB+KNN, RF+NB+NNET, and RF+KNN+NNET.

All these contributions are structured in the paper as follows. Section 2 presents the description of the dataset. Section 3 presents the description and implementation of the proposed approach. Section 4 presents simulation results and their discussion. Finally, Section 5 concludes the outcomes of the paper in a synopsized way.

#### **2. Description of Dataset**

To implement the proposed approach, the data of an appliance (refrigerator) from the Tracebase dataset [33] is considered. This dataset consists of 43 different appliances with 158 device IDs that are connected to various smart homes/buildings. Each appliance consists of CSV files that represent the energy consumption data of a day. A detailed description of this dataset can be obtained from [34]. Further, this dataset was considered and used in various literary works. The Tracebase dataset was used in the extensive study of different non-intrusive load monitoring (NILM) power consumption datasets described in [35–37]. The present and the future directions for energy management techniques using NILM datasets are discussed in [38].

The CSV file (dev\_98C08A\_2011.09.17.csv) data of the refrigerator appliance is prepared with the columns such as CAPTURED\_DATE, CAPTURED\_HOUR, CAPTURED\_MINUTE, CAPTURED\_SECOND, and CAPTURED\_READING for implementing the proposed ensemble classifier approach.

#### **3. Description and Implementation of the Proposed Approach**

The conceptual model of the proposed approach is shown in Figure 1. It consists of four parts, viz., Part 1, Part 2, Part 3, and Part 4. The smart home energy consumption dataset will be given as input to Part 1. In Part 1, an analysis of the missing data will be carried out for understanding the missingness in the original dataset. Further, the identification and removal of different anomalies (viz., garbage data, outliers in the data, and redundant data) will be performed. From this, a dataset with the abovementioned anomalies removed will be produced and given as input to Part 2. In Part 2, the imputation of missing data will be completed. In Part 3, a single-classifier approach will be applied. This will provide a recommendation of the best single-classifier approach as the output. By taking this best single-classifier as the basis, the ensemble classifiers approach will be applied in Part 4. This will provide a recommendation of the best ensemble classifier to perform the imputation.

**Figure 1.** Conceptual model for the proposed approach.

The implementation flow of the proposed ensemble classifiers approach through all the proposed parts is shown in Figure 2. The detailed description and implementation processes are discussed in Sections 3.1–3.4 respectively for Part 1, Part 2, Part 3, and Part 4.

#### *3.1. Implementation of Part 1 (Anomaly Detection and Removal)*

The process starts from Part 1 by reading the smart home energy consumption dataset and saving it in an object "shec\_dat". Initially, the missing data information in this "shec\_dat" is analyzed [6]. Further, the process is continued with the identification of garbage data in the dataset. To identify the garbage data i.e., the data other than the numerical data, a function *grepl("[[:digit:]]* is used on each column of the dataset. If garbage data exists, those records are removed and the remaining data are given as input to identify the outliers data. If there is no garbage data, the existing dataset is used as it is and is given as input to identify the outliers data. The outliers data is the data that does not exist within the expected range. To identify outliers data, a boxplot analysis is applied to the data obtained after removing the garbage data. The boxplot analysis is a standardized approach to showing data distribution in a five-number summary (i.e., minimum, first quartile, median, third quartile, and maximum). The data that lies in between the "minimum" and "maximum" values are considered as the data within the range and useful for the analysis. The data that lies below the "minimum" and above the "maximum" values are considered outliers data and needs to be removed to achieve better analytics. The function boxplot() is applied to the readings column by using *boxplot(shec\_dat\$CAPTURED\_READING, plot=F)\$out*. If the outliers exist in the readings column, those records are removed and the remaining data are given as input to identify the redundant data. If there are no outliers, the existing dataset is used as it is to identify the redundant data. In general, redundant data refers to the duplication of the entire record in the dataset. However, in this case, there exist two types of redundant data in the dataset. They are the records with the same timestamp and

same reading information, and records with the same timestamp and different reading information. The detailed process of identifying these types of redundant data is discussed in [39]. If the abovementioned types of redundant data exist, those records are removed. If there are no redundant data, the existing data is used as it is to perform the next step. At the end of Part 1, a dataset is obtained after removing all the anomalies (garbage data, outliers data, and redundant data). As several records in the dataset are removed due to the existence of different anomalies, this dataset consists of missing timestamps. Hence, these missing timestamps are filled, and the respective reading information is set to "NA (Not Available)" [8] before proceeding to the implementation of Part 2.

**Figure 2.** Implementation flow of the proposed approach.

#### *3.2. Implementation of Part 2 (Data Imputation)*

Once all the missing records are finalized in the dataset obtained after removing all the anomalies, the imputation methods such as median imputation, KNN imputation, and bagging imputation are applied. The implementation of these imputation methods produces datasets with imputed reading values. Further, the single-classifier approach is applied to these imputed datasets. The implementation of the median, KNN, and bagging imputation methods is discussed in Sections 3.2.1–3.2.3 respectively.

#### 3.2.1. Implementation of the Median Imputation Method

In the median imputation method, the median value of the reading information in the CAPTURED\_READING column is calculated, and that value is used for imputing the missing reading information. This imputation method is simple and fast. The process of calculating the median value starts with the ordering of readings information in ascending order. Once the ordering of readings information is done, then the number of values (odd or even) in the CAPTURED\_READING is taken into consideration. Here, the number of values plays a major role in calculating the median value of the readings information. The formula for calculating the median value is given in Equation (1).

$$Median(D) = \begin{cases} \begin{array}{l} D\left(\frac{s+1}{2}\right) \qquad \text{if } s \text{ is odd} \\ \frac{D\left(\frac{s}{2}\right) + D\left(\frac{s}{2} + 1\right)}{2} \quad \text{if } s \text{ is even} \end{array}} \tag{1}$$

where *D* = list of values ordered in the CAPTURED\_READING column, and *s* = number of values in the CAPTURED\_READING column.

If the number of values in the CAPTURED\_READING column is odd, then the middle value is considered as the median. If the number of values is even in the CAP-TURED\_READING column, then the average of the middle two values is considered as the median.

#### 3.2.2. Implementation of the KNN Imputation Method

In the KNN imputation method, the distance between the k-nearest neighbor values is calculated by using the Euclidean distance metric. In the CAPTURED\_READING column, the distance between the k-closest samples of the readings is calculated and that distance value is used to impute the missing reading information. The formula for calculating Euclidean distance is given in Equation (2).

$$dist(p, q) = \sqrt{\sum\_{i=1}^{m} \left(p\_i - q\_i\right)^2} \tag{2}$$

where *dist* = Euclidean distance, *m* = number of points, and *pi* & *qi* are the points.

3.2.3. Implementation of the Bagging Imputation Method

In the bagging imputation method, the term 'bagging' refers to bootstrap aggregation. The bootstrap is a statistical technique of iteratively resampling the data with replacement in the dataset. To perform this, initially, the number of bootstrap samples is to be fixed, and then the sample size. For each sample of bootstrap the following steps are performed: draw the sample with replacement, fit the model, anticipate the performance of the model based on the out-of-bag sample, and calculate the average of the sample of the model. The multiple iterations of sampling improve the prediction performance of the model. The bagging method fits a bagged tree. This method is simple, powerful, and accurate to impute the missing values in the readings information. However, it is computationally high-cost.

#### *3.3. Implementation of Part 3 (Single-Classifier Approach)*

In this section, the single-classifier approach is performed using various classifiers, viz., RF, SVM, DT, NB, KNN, and NNET, for the classification. All these classifiers are implemented individually on the dataset. To implement these, the dataset is divided into train\_set and test\_set. These classifiers are trained on the train\_set using k-fold crossvalidation. Here, the k-value considered is 10. Further, these classifiers are applied to the test\_set to predict the classes Yes (Y) or No (N). Here, class 'Y' represents missing data, and class 'N' represents non-missing data. After the implementation, the performance metrics such as accuracy, precision, recall/sensitivity, specificity, and F1 score are computed using a confusion matrix to evaluate each classifier's performance. The confusion matrix is shown in Figure 3 and the formulae for computing the performance metrics are given in Equations (3)–(7).

$$Accuracy = \frac{T.Pos. + T.Neg.}{T.Pos. + T.Neg. + F.Pos. + F.Neg.} \tag{3}$$

$$Precision = \frac{T.Pos.}{T.Pos. + F.Pos.} \tag{4}$$

$$Recall/Sensitivity = \frac{T.Pos.}{T.Pos. + F.Neg.} \tag{5}$$

$$Specificity = \frac{T.Neg.}{T.Neg. + F.Pos.} \tag{6}$$

$$F1Score = \frac{2 \ast (Precision \ast Recall)}{(Precision + Recall)} \tag{7}$$

### **Reference**


**Figure 3.** Confusion matrix.

If all the single classifiers are implemented and their performance is verified, then, the best single classifier is recommended. Otherwise, the performance metrics are re-verified.

#### *3.4. Implementation of Part 4 (Ensemble Classifiers Approach)*

This section uses the best single classifier recommended in Part3 as the input to develop ensemble classifiers. The ensemble of classifiers is performed using the "stacking" method. In stacking, there are two layers called the top layer and the bottom layer. The top layer consists of a classifier, which is referred to as a base classifier and the bottom layer consists of other classifiers. The output of the bottom layer is given as input to the top layer. The classifier used in the top layer is an ensemble with the output of the bottom layer classifiers, which produces an ensemble classifier. The stacking of classifiers is shown in Figure 4. From this figure, it is seen that the single classifiers used in the bottom layer are an ensemble with the recommended best classifier used in the top layer. For example, the single classifiers SVM and DT are part of the ensemble with the recommended best classifier RF. Similarly, all the other single classifiers form an ensemble with RF and produce ensemble classifiers. To implement these ensemble classifiers, the imputed datasets are given as input. Further, each imputed dataset is divided into train\_set and test\_set. The ensemble classifiers are trained on the train\_set using k-fold cross-validation. Here, the k-value considered is 10. Further, these ensemble classifiers are applied to test\_set to predict the classes Y or N. After the implementation, the performance metrics such as accuracy, precision, recall/sensitivity, specificity, and F1 score are computed using a confusion matrix to evaluate each ensemble classifier's performance. If all the ensemble classifiers are implemented and their performance is verified then the best ensemble classifier for the imputation is recommended, otherwise the performance metrics are re-verified.

**Figure 4.** Stacking of classifiers.

#### **4. Simulation Results and Discussion**

In keeping with the aims of the paper, the simulation results of the implementation are presented in three subsections. Sections 4.1–4.3 present the results corresponding to anomaly detection and removal, single-classifier approach, and ensemble classifiers approach, respectively.

#### *4.1. Results Corresponding to Anomaly Detection and Removal*

This section presents the details of the missing data in the original CSV file (original dataset) and the missing data in the dataset after eliminating the anomalies. The number of records in this original dataset is 155,374. During the analysis of missing data, 700 records are missed in the original dataset [7]. During the identification of garbage data, no garbage data (other than numerical data) are identified in the original CSV file. Hence, no records are removed and the same number of records (155,374) are available. During the identification of outliers data, there are 25 readings identified as outliers and the respective records are removed from the dataset. The removal of records with outliers left the dataset with 155,349 records. During the identification of redundant data, the records with the same timestamp and same reading are identified and those records are removed from the dataset. This removal left the dataset with 98,779 records. Further, the records with the same timestamp and different readings are identified and those records are removed from the dataset. This removal left the dataset with 72,597 records. Once the redundant data are removed, the missing data are filled with the respective timestamps and the respective reading with NA value, as shown in Figure 5 (all the highlighted rows). After this filling, there are 86,400 records in the dataset, out of these, 13,803 records contain missing readings.

The proportions of the available data and missing data in the original dataset and the dataset available after removing anomalies are shown in Figure 6a–c. These figures show the proportion of the missing data and available data in the considered dataset in three different scenarios, namely, (i) consideration of the original dataset, (ii) consideration of the dataset that is obtained after removing the anomalies, and (iii) consideration of the dataset after filling the missing timestamps, ready for the imputation.

From Figure 6a, it is understood that the proportion of available data is 99.55% and missing data is 0.45% in the original dataset. From Figure 6b, it is seen that the proportion of available data is 84% and missing data is 16% in all columns of the dataset obtained after removing anomalies. From Figure 6c, it is evident that there are no missing data in the columns CAPTURED\_DATE, CAPTURED\_HOUR, CAPTURED\_MINUTE, CAP-TURED\_SECOND and the proportion of data availability is 84%. Further, there are missing readings in the column CAPTURED\_READING with a proportion of 16%.


**Figure 5.** Data after filling missing timestamps and placing NA.

**Figure 6.** Proportion of missing data in the dataset. (**a**) Original dataset. (**b**) After removing the anomalies. (**c**) After filling missing timestamps.

#### *4.2. Results Corresponding to the Single-Classifier Approach*

This section presents the performance of a single-classifier approach on the imputed datasets. The performance of classifiers in the median, KNN, and bagging imputation methods are discussed in Sections 4.2.1–4.2.3, respectively.

#### 4.2.1. Performance of the Single-Classifier Approach in the Median Imputation Method

The performance metrics of each classifier are shown in Figure 7, where the red colored bar(s) indicate the highest value achieved corresponding to that particular metric. From this, the highest accuracy value of 98.1% is observed in RF, while the lowest accuracy value of 76.3% is observed in KNN, as shown in Figure 7a. The highest precision value of 99% is observed in RF, while the lowest precision value of 80.5% is observed in SVM and NB, as shown in Figure 7b. The highest recall value of 100% is observed in SVM and NB, while the lowest recall value of 87.9% is observed in KNN, as shown in Figure 7c. The highest specificity value of 95.9% is observed in RF, while the lowest specificity value of 0% is observed in SVM and NB, as shown in Figure 7d. The highest F1 Score value of 98.8% is observed in RF, while the lowest F1 Score value of 85.7% is observed in KNN, as shown in Figure 7e. From the subplots in Figure 7a–e, it is understood that the classifier RF has outperformed the others. Further, the performance summary of all the single classifiers is given in Table 1.

**Figure 7.** Performance metrics for the single-classifier approach on the median imputed dataset. (**a**) Accuracy (**b**) Precision. (**c**) Recall. (**d**) Specificity. (**e**) F1 Score.


**Table 1.** Performance comparison of single-classifier approach on the median imputed dataset.

4.2.2. Performance of the Single-Classifier Approach in the KNN Imputation Method

The performance metrics of each classifier are shown in Figure 8, where the red colored bar(s) indicate the highest value achieved corresponding to that particular metric. From this, the highest accuracy value of 87.7% is observed in RF, while the lowest accuracy value of 68% is observed in NNET, as shown in Figure 8a. The highest precision value of 86.6% is observed in RF, while the lowest precision value of 80.3% is observed in NNET, as shown in Figure 8b. The highest recall value of 100% is observed in RF, SVM, DT, NB, and KNN, while the lowest recall value of 79.9% is observed in NNET, as shown in Figure 8c. The highest specificity value of 36.3% is observed in RF, while the lowest specificity value of 0% is observed in SVM and KNN, as shown in Figure 8d. The highest F1 Score value of 92.8% is observed in RF, while the lowest F1 Score value of 80.1% is observed in NNET, as shown in Figure 8e. From the subplots in Figure 8a–e, it is understood that the classifier RF has outperformed the others. Further, the percentage summary of all classifiers is given in Table 2.

**Figure 8.** Performance metrics for the single-classifier approach on the KNN imputed dataset. (**a**) Accuracy. (**b**) Precision. (**c**) Recall. (**d**) Specificity. (**e**) F1 Score.


**Table 2.** Performance comparison of the single-classifier approach on the KNN imputed dataset.

4.2.3. Performance of the Single-Classifier Approach in the Bagging Imputation Method

The performance metrics of each classifier are shown in Figure 9, where the red colored bar(s) indicate the highest value achieved corresponding to that particular metric. From this, the highest accuracy value of 95.2% is observed in RF, while the lowest accuracy value of 75.7% is observed in NNET, as shown in Figure 9a. The highest precision value of 100% is observed in RF and DT, while the lowest precision value of 79.5% is observed in NNET, as shown in Figure 9b. The highest recall value of 100% is observed in SVM, and NB, while the lowest recall value of 84.3% is observed in DT, as shown in Figure 9c. The highest specificity value of 100% is observed in RF and DT, while the lowest specificity value of 0% is observed in SVM, NB, and NNET, as shown in Figure 9d. The highest F1 Score value of 96.9% is observed in RF, while the lowest F1 Score value 86.1% is observed in NNET, as shown in Figure 9e. From the subplots Figure 9a–e, it is understood that the classifier RF has outperformed the others. Further, the percentage summary of all classifiers is given in Table 3.

**Figure 9.** Performance metrics for the single-classifier approach on the bagging imputed dataset. (**a**) Accuracy. (**b**) Precision. (**c**) Recall. (**d**) Specificity. (**e**) F1 Score.


**Table 3.** Performance comparison of the single-classifier approach on the bagging imputed dataset.

*4.3. Results Corresponding to the Ensemble Classifiers Approach*

This section presents the performance of the ensemble classifiers approaches on the imputed datasets. The performance of ensemble classifiers in the median, KNN, and bagging imputation methods are discussed in Sections 4.3.1–4.3.3 respectively.

#### 4.3.1. Performance of the Ensemble Classifiers Approach in the Median Imputation Method

The performance metrics of each ensemble classifier are shown in Figure 10, where the red colored bar(s) indicate the highest value achieved corresponding to that particular metric. From this, the highest accuracy value of 98.9% is observed in RF+SVM+DT and RF+DT+NNET, while the lowest accuracy value of 72.5% is observed in RF+NB+KNN, as shown in Figure 10a. The highest precision value of 99.4% is observed in RF+SVM+NB, while the lowest precision value of 77.5% is observed in RF+SVM+KNN, as shown in Figure 10b.

The highest recall value of 100% is observed in RF+SVM+DT, RF+DT+NB, RF+DT+NNET, RF+NB+NNET, and RF+KNN+NNET, while the lowest recall value of 81.6% is observed in RF+SVM+NB, as shown in Figure 10c. The highest specificity value of 94.5% is observed in RF+SVM+DT, RF+DT+NB, RF+DT+NNET, and RF+KNN+NNET, while the lowest specificity value of 34.7% is observed in RF+NB+KNN, as shown in Figure 10d.

The highest F1 Score value of 99.3% is observed in RF+SVM+DT, RF+DT+NB, RF+DT+NNET, and RF+KNN+NNET, while the lowest F1 Score value of 82.2% is observed in RF+SVM+KNN and RF+NB+KNN, as shown in Figure 10e. From the subplots in Figure 10a–e, it is understood that the ensemble classifiers RF+SVM+DT and RF+DT+NNET have outperformed the others.

Further, the performance summary of all ensemble classifiers with respect to various parameters is given in Table 4.


**Table 4.** Performance comparison of the ensemble classifiers approach on the median imputed dataset.

*Sensors* **2022**, *22*, 9323

**Figure 10.** Performance metrics for the ensemble classifiers approach on the median imputed dataset. (**a**) Accuracy. (**b**) Precision. (**c**) Recall. (**d**) Specificity. (**e**) F1 Score.

4.3.2. Performance of the Ensemble Classifiers Approach in the KNN Imputation Method

The performance metrics of each ensemble classifier are shown in Figure 11, where the red colored bar(s) indicate the highest value achieved corresponding to that particular metric. From this, the highest accuracy value of 80.2% is observed in RF+DT+KNN, while the lowest accuracy value of 70.9% is observed in RF+SVM+KNN, as shown in Figure 11a. The highest precision value of 99.3% is observed in RF+DT+KNN, while the lowest precision value of 81.3% is observed in RF+NB+NNET, as shown in Figure 11b.

The highest recall value of 82.6% is observed in RF+NB+NNET, while the lowest recall value of 80.5% is observed in RF+SVM+DT, RF+SVM+NNET, as shown in Figure 11c. The highest specificity value of 43.6% is observed in RF+DT+NNET, while the lowest specificity value of 19.2% is observed in RF+SVM+NNET, as shown in Figure 11d. The highest F1 Score value of 89% is observed in RF+DT+KNN, while the lowest F1 Score value of 81.9% is observed in RF+NB+NNET, as shown in Figure 11e. From the subplots in Figure 11a–e, it is understood that the ensemble classifier RF+DT+KNN has outperformed the others. Further, the performance summary of all ensemble classifiers with respect to various parameters is given in Table 5.


**Table 5.** Performance comparison of the ensemble classifiers approaches on the KNN imputed dataset.

4.3.3. Performance of the Ensemble Classifiers Approach in the Bagging Imputation Method

The performance metrics of each ensemble classifier are shown in Figure 12, where the red colored bar(s) indicate the highest value achieved corresponding to that particular metric. From this, the highest accuracy value of 89.6% is observed in RF+SVM+DT, while the lowest accuracy value of 71.2% is observed in RF+SVM+KNN, as shown in Figure 12a. The highest precision value of 98.8% is observed in RF+SVM+DT, while the lowest precision value of 86.5% is observed in RF+SVM+KNN, as shown in Figure 12b. The highest recall value of 89.4% is observed in RF+SVM+DT, while the lowest recall value of 78.8% is observed in RF+SVM+NNET, as shown in Figure 12c.

The highest specificity value of 91.1% is observed in RF+SVM+DT, while the lowest specificity value of 0.2% is observed in RF+SVM+NNET, as shown in Figure 12d. The highest F1 Score value of 93.9% is observed in RF+SVM+DT, while the lowest F1 Score value 82.8% is observed in RF+SVM+KNN, as shown in Figure 12e. From the subplots in Figure 12a–e, it is understood that the ensemble classifier RF+SVM+DT has outperformed the others.

Further, the performance summary of all ensemble classifiers with respect to various parameters is given in Table 6.

**Table 6.** Performance comparison of the ensemble classifiers approaches on the bagging imputed dataset.


**Figure 12.** Performance metrics for ensemble classifiers approach on bagging imputed dataset. (**a**) Accuracy. (**b**) Precision. (**c**) Recall. (**d**) Specificity. (**e**) F1 Score.

#### **5. Conclusions**

This paper proposes a machine learning-based ensemble classifiers approach to address the anomalies present in smart homes' energy consumption data. This proposed approach has proven to be more effective than the conventional single-classifier approach that is presented in the literature. The salient observations from this work are summarized as follows:



Thus, the proposed ensemble classifiers approach has successfully handled anomalies that exist in the smart home energy consumption data.

#### *Impacts and Implications of the Work*

The proposed work in this paper helps in data preprocessing by the cleansing of data, which is typically essential to carry out precise analytics, and thereby, take superior decisions for energy management in smart buildings. Furthermore, the outcome of this work helps as a ready reference to understand the irregularities of the live data captured in a smart building/home/grid application for better data analytics. This impacts one of the important objectives of "United Nations Sustainable Development Goals (UN SDGs)—SDG 7: Energy" in producing an anomaly-free dataset for providing several customer services.

In addition, the identification of different data anomalies, viz., missing data, outliers data, garbage data, and redundant data in the energy consumption dataset, may be applied to the malfunctioning of metering infrastructure, failure/glitches of communication channels, cyber-attacks, energy thefts, unanticipated situations in power networks, etc.

**Author Contributions:** Conceptualization, P.P.K.; formal analysis, G.L.K.M.; funding acquisition, G.L.K.M.; investigation, P.P.K.; project administration, R.K.; software, P.P.K. and G.L.K.M.; supervision, Y.V.P.K.; Validation, Y.V.P.K.; writing—original draft, P.P.K.; writing—review & editing, Y.V.P.K. and R.K. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** The authors would like to thank VIT-AP University, Amaravati, Andhra Pradesh, India for funding the open-access publication fee for this research work.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**


#### **References**


**Francisco Sánchez-Sutil and Antonio Cano-Ortega \***

Department of Electrical Engineering, University of Jaen, Campus Lagunillas s/n, Edificio A3, 23071 Jaén, Spain; fssutil@ujaen.es

**\*** Correspondence: acano@ujaen.es; Tel.: +34-953-212343

**Abstract:** Irrigation installations in cities or agricultural operations use large amounts of water and electrical energy in their activity. Therefore, optimising these resources is essential nowadays. Wireless networks offer ideal support for such applications. The long-range wide-area network (LoRaWAN) used in this research offers a large coverage of up to 5 km, has low power consumption and does not need additional hardware such as repeaters or signal amplifiers. This research develops a control and monitoring system for irrigation systems. For this purpose, an irrigation algorithm is designed that uses rainfall probability data to regulate the irrigation of the installation. The algorithm is complemented by checking the sending and receiving of information in the LoRa network to reduce the loss of information packets. In addition, two temperature and humidity measurement devices for LoRaWAN (THMDLs) and an electrovalve control device for LoRaWAN (ECDLs) were developed. The hardware and software were also designed, and prototypes were built with the development of the electronic board. The wide coverage of the LoRaWAN allows the covering of small to large irrigation areas.

**Keywords:** LoRaWAN; smart irrigation systems; smart energy

#### **1. Introduction**

In modern irrigated agricultural facilities, the competitiveness of the sector, combined with rising global temperatures, has necessitated the development of new and more sustainable agricultural techniques and crops to help reduce water consumption in these facilities, coupled with optimal water and energy management strategies. An efficient farming system is defined by the right amount of water at the right time, resulting in improved crop yields through efficient energy consumption. The use of innovative irrigation technologies is necessary to ensure an optimal amount of irrigation water. Optimisation of the irrigation system involves improving crop development conditions by planning the installation: optimal water and energy quantity and management. This requires variable monitoring and decision-making systems that allow us to optimise current irrigation installations.

The need for optimisation in agriculture began in the last century. In the beginning, design solutions with wired electronics were used but had numerous problems. Since then, the development and optimisation of irrigation systems have been linked to the rise and evolution of ICT (Information and Communication Technology). It is important to design sustainable models capable of supplying energy through renewable sources based on solar photovoltaic (PV) energy. Another fundamental part is given by the communication network, which is currently realised through wireless networks with low energy consumption, such as Low-Power Wide-Area Networks (LPWANs).

This article describes the design of an intelligent system to implement the irrigation control of a facility located on the university campus of the University of Jaén through wireless communication and low energy consumption powered by solar PV panels. This system consists of a wireless network with sensors and actuators that send the collected data, which are subsequently analysed in the cloud. This research focuses on optimising

**Citation:** Sánchez-Sutil, F.; Cano-Ortega, A. Smart Control and Energy Efficiency in Irrigation Systems Using LoRaWAN. *Sensors* **2021**, *21*, 7041. https://doi.org/ 10.3390/s21217041

Academic Editor: Ivan Andonovic

Received: 20 September 2021 Accepted: 21 October 2021 Published: 24 October 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

an irrigation system and reducing its energy consumption with an LPWAN supplied by a PV system.

#### **2. Related Work**

In the last decade, there has been a tendency to implement intelligent irrigation management systems based on wireless sensor networks, which have also been used in other areas such as industry, cities, housing, etc. The advantages of these wireless networks in the agricultural sector have been analysed by several authors, such as Goumopoulos et al. [1], who described the design of an intelligent system based on a wireless sensor and actuator network used for irrigation control in greenhouses. Doko Bandur et al. [2] analysed the energy consumption of the different components of the wireless sensor network, indicating the main energy consumers as well as how energy efficiency should be improved. The study of greenhouse crops with wireless technology used for sensor communication as well as the transmission rate was presented by Kochhar et al. [3]. Hamami et al. [4] reviewed the wireless sensor networks used in irrigation systems. This type of technology is ideal for system management and reducing water consumption.

Nowadays, the integration of devices with long-range wide-area networks (LoRaWAN) stores data in the cloud, where they are processed, analysed through Big Data and interact with other networks. These technologies enable the design of the Internet of Things (IoT) and cloud computing systems applicable to agriculture. Froiz-Míguez et al. [5] detailed an IoT system that develops a smart irrigation system covering large areas through a network (LPWAN) with soil temperature and humidity and air temperature sensors. Valente et al. [6] presented the development of a low-cost system and analysed energy consumption with a maximum of 400μA using a LoRaWAN network. Ameloot et al. [7] developed and analysed a wireless network with six wireless nodes to characterise the temperature and relative humidity of suburban areas using a long-range network (LoRa) at various locations in the city of Ghent (Belgium). It has also been used by Cano-Ortega et al. [8], who developed an optimal LoRa network using ABC algorithms to reduce Package Loss Rate (PLR) and dispatch time to determine the load profiles of a dwelling. Smart street lighting systems using an LPWAN control was realised by Sánchez-Sutil et al. [9,10]. Finally, Cruz et al. [11] monitored the filling level of urban waste containers in Lisbon (Portugal) using LPWAN technology. Ritesh-Kumar et al. [12] applied LoRaWANs to implement a greenhouse control system that enables energy and water savings through continuous monitoring of the installation.

The advances of ICTs in irrigation systems have been quite important, as can be seen. Nam et al. [13] discussed the use of ICTs in water management in agriculture and irrigation facilities. Goap et al. [14] presented an intelligent system that obtains soil moisture data through sensors together with current meteorological information to optimise the irrigation of an agricultural facility. To minimise water losses, Canales-Ide et al. [15] analysed a set of techniques and criteria aimed at optimal irrigation management that determines the water needs of plants and the optimal efficiency of the irrigation systems. By studying each plant from existing databases, Munir et al. [16] proposed an optimal irrigation system based on daily needs, considering the time of day, soil moisture and humidity. Migliaccio et al. [17] developed a smartphone application for scheduling urban lawn irrigation using evapotranspiration data from weather stations.

Among the most important advantages of using IoT systems in agriculture is automated irrigation, as measurements can be taken by sensors (humidity, temperature, irradiation, etc.) and actions (solenoid valves, pumps) through the different devices that make up this system. Methodologies have also been developed for the analysis and development of scientific networks based on the evaluation of the needs of different crops, soil attributes, climate, etc. Some authors have developed different IoT devices, as Fernández-Ahumada et al. [18] presented a low-cost device for automatic irrigation based on an ESP32-LoRa microcontroller and Internet connection through the Sigfox network. Fraga-Lamas et al. [19] proposed an IoT smart irrigation system specifically designed for remote

urban areas. Chazarra-Zapata et al. [20] presented an IoT device that optimises battery power consumption, GPRS (General Packet Radio Service) with an Nb-IoT (Narrow-band Internet of Things) system for communication and sending information every two hours to reduce energy consumption. López-Morales et al. [21] proposed an IoT system that enables decision making on pumping efficiency in an irrigation community by easily integrating heterogeneous data sources, which improves the energy efficiency of pumping with higher economic, environmental and social returns in a sustainable way. Additionally, Glória et al. [22] developed a sustainable irrigation system that allows for improving natural resources, both water and energy, and reducing the economic cost through an IoT system with a network with batteries that has communication times of two hours. The monitoring of climatic parameters, soil moisture, vegetation health, plant diseases and crop yields while using IoT systems with wireless networks was developed by Khan et al. [23]. Additionally, Mohammed et al. [24] developed an IoT system for the control of date palms in arid regions using an underground irrigation system that remotely controlled climatic parameters and water volume in the soil. Tiglao et al. [25] presented a low-cost system that has a soil moisture sensor, a temperature sensor, a humidity sensor and a valve actuator within a mesh configuration that regulates drip irrigation. Finally, Sánchez Sutil et al. [26] performed the design of an intelligent system for measuring electrical variables to obtain load profiles in households.

Different control systems applied to irrigation have been developed. Al-Ali et al. [27] presented a microcontroller based on fuzzy logic algorithms for drip irrigation control, and Sudharshan et al. [28] studied a solenoid valve control system using fuzzy logic data from temperature, humidity and soil moisture sensors. Nawandar et al. [29] proposed a greenhouse, garden and farm control system and an automatic irrigation system capable of tracking the water needs of the crop, providing real-time and historical data of the farm. Liao et al. [30] performed the design of an automatic irrigation system with real-time soil moisture data to estimate the depth of water absorption. Finally, Eltohamy et al. [31] analysed how phosphorus released from the soil surface in paddy fields is influenced under different irrigation scenarios for different soil moistures.

The literature review found the following technological aspects.


Based on the weaknesses and opportunities identified, the main contributions of this research are:


The rest of the document is organised as follows: Section 3 provides an overview of the developed system, including the modular architecture of the system, the developed LPWAN network and the supported sensors and their interconnection with the platform. Section 4 details the technological results obtained. An evaluation of the developed system in terms of a prototype, the analysis of the performance of the LPWAN wireless network, the agronomic impact of the system and an evaluation of the energy consumption of the system are provided. Finally, Section 5 presents the conclusions drawn from this work.

#### **3. Methodology and Design**

#### *3.1. Network Scheme*

The proposed network scheme has two distinct parts. The first corresponds to the LoRaWAN and the second to the Wide-Area Network (WAN), which can be either wired using Ethernet protocol or wireless using Wi-Fi (Wireless Fidelity) protocol. The PG1301 concentrator manufactured by Dragino Technology Co., LTD., Shenzhen, China, is used as the link between the two networks. PG1301 is mounted on a Raspberry 3 or higher, which provides support for the WAN network. In addition, it can host up to 1000 LoRaWAN devices, which is sufficient for most applications. If a larger number is required, it is sufficient to install more concentrators to cover the required needs. Figure 1 shows the network scheme.

**Figure 1.** Proposed network scheme.

Within the LoRaWAN, communication is bidirectional between the Temperature and Humidity Measurement Device for LoRaWAN (THMDLs) and Electrovalve Control Device for LoRaWAN (ECDLs) with PG1301 since both data messages (upstream) and command messages (downstream) are needed. The information is concentrated on The Things Network (TTN) server [32]. This service is specially designed to work with LoRaWANs and supports upstream and downstream messages to LoRaWAN devices, such as the ones developed in this research (THMDL and ECDL). Currently, TTN has just implemented the new v3 version, which is much more powerful than the previous one. From TTN, it is possible to send and receive information to different IoT services through the available integrations. These include (i) AWS IoT [33], (ii) Akenza core [34], (iii) Datacake [35], (iv) deZem [36], (v) InfluxDB Cloud 2. 0 [37], (vi) Microsoft Azure [38], (vii) Qubitro [39], TagoIO [40], (ix) thethings.iO [41], (x) ThingsBoard [42], (xi) ThingSpeak [43], (xii) Ubidots [44] and (xiii) UIB [45].

In addition to the above integrations, it is possible to use other options such as (i) Message Queue Telemetry Transport (MQTT) [46], (ii) LoRa cloud [47], (iii) Node-RED [48] and If This, Then That (IFTTT) [49]. Finally, TTN has available the HTTP Webhooks integration that allows sending data to any server using POST and GET. From these, IoT services such as Google Sheets [50], Google Firebase [51], etc., can be accessed. As can be seen, there is a

wide range of possibilities that allows the developer or user to find the service that best suits each situation at any given time.

#### *3.2. Hardware Design*

#### 3.2.1. Design Challenges and Objectives

In order to obtain fully functional devices that perfectly fulfil their assigned tasks, it is necessary to perfectly define the performance objectives to be met by the devices. These objectives will have a decisive influence on the choice of components and technologies to be implemented in THMDL and ECDL. They will also have an important bearing on the software that will run inside these devices. The hardware design objectives are listed below:


The aforementioned objectives entail overcoming a series of challenges and difficulties, the resolution of which will result in the development of fully functional devices. The following is the list of elements to be considered:


#### 3.2.2. Components

An appropriate selection of the components to be implemented in the devices will have a decisive influence on the optimisation of the devices. The design objectives of low power consumption, small size, modularity and operational safety must be addressed in the approach to component selection.

#### Microcontroller

In order for a device to function properly and perform all assigned tasks, it is necessary to have a core. This core is the microcontroller that must drive the interaction between components within the device. Thus, the microcontroller must have different elements such as the microprocessor, memory, ports for communications with other components, digital and analogue inputs and outputs, etc.

Once the design objectives have been studied, it can be concluded that the Arduino family of microcontrollers is an ideal platform for use in the construction of the devices created in this research. The Arduino platform is endorsed by its use in a multitude of projects with industrial and domestic applications.

Table 1 shows some of the most essential features of the various microcontrollers in the Arduino family. This table is the basis for choosing the microprocessor applied to the devices developed in this research.

The Arduinos shown have sufficient memory capacity for program code and data. Therefore, the decision to use one or the other microcontroller depends on other design objectives, which are mainly power consumption and size. In this sense, the microcontroller that best meets the requirements defined in the design is the Arduino Nano (AN), whose specifications can be found in [52].


**Table 1.** Comparison of the Arduino family.

LoRa Wireless System

Regarding the LoRa communication system, it should be noted that two components are required: (i) the end device (to be installed in the THMDL and ECDL) and (ii) the gateway responsible for communication with the cloud.

LoRa communication chips are diverse, including (i) Semtech (Semtech Corporation, Camarillo, CA, USA) SX1308 [56], SX1301 [57], SX1276 [58], SX1278 [58] and SX1257 [59]; (ii) HOPERF chip RFM95/96/97/98 [60] (HOPERF, Shenzhen, China); and (iii) Murata CMWX1ZZABZ (Murata Manufacturing, Nagaokakyo, Japan) [61].

Commercially available LoRaWAN-compatible models are built from these chips. Five models were analysed. From this analysis, the model to be implemented in the THMDL and ECDL was chosen. The models analysed were the following: (i) Arduino MKRWAN 1310 (Arduino AG, Ivrea, Italy) [62]; (ii) Monteino (LowPowerLab, Canton-Michigan, USA) [63]; (iii) Libelium (Libelium, Zaragoza, Spain) [64]; (iv) Lopy4 (Pycom, Bucharest, Romania) [65]; and (v) Dragino LoRa Bee (DLB) (Dragino Technology Co., LTD., Shenzhen, China) [66]. These models use the following chips: (i) Murata CMWX1ZZABZ for the Arduino MKR WAN 1310; (ii) HOPERF chip RFM95/96/97/98 for the Monteino; (iii) SX1276 and SX1278 for the Lopy4 and DLB, respectively; and (iv) Semtech SX1272 for the Libelium. The characteristics of the models analysed are similar. Therefore, the DLB was chosen in this research as the component to be installed in the THMDL and ECDL due to its reduced price. Table 2 illustrates the characteristics of the components analysed.


**Table 2.** Comparison of LoRa end-devices.

Once the LoRa component was selected, it was necessary to choose the gateway. This component is responsible for handling the upstream and downstream messages sent back and forth between the THMDL and ECDL to TTN. Although there are different options on the market, we chose to use the Dragino family to ensure better compatibility with the chosen LoRa component. Table 3 shows the four gateways tested, from which the LoRa PG1301 [67] concentrator was chosen. PG1301 is capable of handling up to 1000 devices with 10 channels of communication, which is more than enough for most systems. If one needs to control more than 1000 LoRa devices, additional gateways can be added. PG1301 was mounted on a Raspberry Pi computer that provides support for Internet access, either via Ethernet cable or Wi-Fi.

**Table 3.** Comparison of LoRa gateways and concentrators.


Electrical Variables Meter

For the measurement of DC variables, there are fewer options with sufficient quality. There are three possibilities: (i) FZ0430 [71]; (ii) ACS712 [72]; and (iii) INA219 [73]. FZ0430 is capable of measuring only voltage up to 25 V in direct current. ACS712 measures currents in ranges of 5, 20 or 30 A, depending on the version. To obtain the power consumption, it is necessary to include a unit of these sensors and then perform the necessary calculations. A complementary option is to use the INA219 m, which is capable of measuring voltage and voltage in the same component. It also provides a direct reading of the power consumed. In this research, INA219 has been chosen as it involves making all voltage, current and power measurements on the same component. The characteristics of the components analysed are shown in Table 4.

**Table 4.** Comparison of electrical sensors.


SHT30 Temperature and Humidity Sensor

After searching for temperature and humidity sensors that could work with the Arduino platform, five families of sensors were found. These families are the following: (i) SHT1x [74]; (ii) SHT2x [75]; (iii) SHT3x [76]; (iv) DHT11 [77]; and (v) DHT22 [78]. Measurement ranges, accuracy, power consumption, supply voltage and communications paths are diverse. Table 5 shows the comparison of the analysed sensor models.


**Table 5.** Comparison of temperature and humidity sensors.

It can be seen that the family offering the best performance is the SHT3x. Accuracy, power consumption and measurement ranges are outstanding. Moreover, the supply voltage and the I2C (Inter-Integrated Circuit) bus are ideal for use in conjunction with AN. Finally, within the SHT3x family, the SHT30 sensor was chosen for implementation in the THMDL.

#### Charge Regulator

SeeedStudio controllers offer a wide range of use in the charge control of batteries with SPs. Of the three models analysed, the Lipo Rider Pro (LiPo) [79] model was chosen for implementation in the devices. This model offers ideal characteristics for the 3.7 V battery and the 4.8 V SP used. Moreover, it is also perfectly suited to the supply voltage of the AN board. Table 6 shows the characteristics of the models tested.


**Table 6.** Comparison of charge regulators.

#### Solar Panel

Solar energy is clean, renewable and simple to use. In this sense, it is of great interest to be used as a source of energy for equipment working outdoors, such as those used in this research. The chosen SP has a high transformation efficiency of around 17%. It is made of monocrystalline material and coated with a thin layer of resin on the surface that protects it from atmospheric agents and makes it ideal for outdoor use. The dimensions of the SP are 138 × 160 mm. The nominal output voltage is 5.5 V, with an output of 540 mA, depending on the luminous intensity received. The open-circuit voltage is 8.4 V, and the maximum load voltage is 6.4. The main characteristics can be found in [82].

#### Battery

For use in THMDL and ECDL, a 3.7 V, 7800 mAh, 28.86 Wh lithium-ion battery was chosen. This is more than enough to power the designed devices. It should be noted that the THMDL has an average consumption of 166.5 Ah, which ensures 31 h of operation with a fully charged battery. In the case of the ECDL, the average consumption is 31 Ah, with a battery life of 174.5 h if the battery is fully charged.

The battery pack has dimensions of 68 × 55 × 19 mm and consists of three individual batteries, with an operating temperature of between −20 ◦C and +60 ◦C. The characteristics of the battery are available for consultation in [83].

#### 3.2.3. Hardware Implementation for the THMDL

In the design of the THMDL, two AN microcontroller was used. This is because the communications paths system chosen (LoRaWAN) and the chip that gives access to the DLB are not compatible with the I2C communications paths bus used to read the INA219 and SHT30 sensors. To perform the measurements, AN1 sends the reading request via the serial port, and AN2 performs the measurement of electrical variables, temperature and humidity and returns them via the serial port. Therefore, AN1 takes care of the communication with the LoRaWAN network and controls the measurement request, and AN2 takes care of the necessary measurements.

Due to the fact that the devices developed in this research work autonomously with no possible wired connection to the electrical and Ethernet networks, it is necessary to implement power supply systems and access to communications paths that do not depend on wired networks. For the electrical network, two solutions have been considered: (i) a battery and (ii) a battery and SP controlled by a regulator. This results in two different versions of the THMDL. The aim is to monitor the power consumption of the joint system and the contribution of the battery and the SP in order to control the system and replace or charge the battery to keep the system running.

For the implementation of the wireless system, there are several applicable technologies. These include the following: (i) Bluetooth; (ii) SigFox; (iii) ZigBee; (iv) Nb-IoT; and (v) Wi-Fi. The coverage offered by each of these is variable, in many cases not exceeding tens of metres, as well as requiring repeaters to extend their coverage. SigFox is owned by a company, which is why all services must be contracted with it. NB-IoT requires a data contract and a SIM card to send and receive information.

This research uses LoRaWAN because communications paths can be achieved up to 10 km, with an average of 5 km, which are sufficient distances for application in most landscaped areas in cities. If larger extensions are required, it is only necessary to install more gateways to ensure the necessary system coverage. Figure 2 shows the block diagrams of the two versions of the THMDL. These diagrams express the relationships that are established between the different components of the devices and how they share information and electrical characteristics.

To complement the block diagrams, the wired connections made between the different components are included. This allows any researcher to be able to clone the devices presented in this research. The power supply, serial port communication, I2C bus communication and LoRaWAN network connections can be seen in the block diagrams. Figure 3 shows the wiring diagram for the THMDL in both versions.

The THMDLs PCB board was also designed in its two versions: battery power supply and battery power supply plus SP. The board integrates all the components used, avoids wiring as much as possible and gives solidity to the whole. The dimensions are 95 × 58 mm for the battery-only version and 191 × 71 mm for the version with battery and SP. Figures 4 and 5 show the design of the PCB boards. Figure 6 shows the electronic schematic of the PCB board for THMDL.

**Figure 2.** Block diagram of the THMDL: (**a**) battery power supply and (**b**) battery and solar panel power supply.

It is important to perform the economic valuation of the THMDL in its two versions to check whether the reduced-price target is met. In this regard, Tables 7 and 8 show the price of each product and the price of the final set for the two versions of the THMDL. It should be noted that the price of the products is obtained from the official shops of the manufacturers. On the other hand, the fact that the components are licence free means that there are compatible components on the market that can further reduce the price of the set.




**Table 8.** Cost of components for the THMDL with a battery and solar panel power supply.

(**a**)

(**b**)

**Figure 3.** Wiring diagram of the THMDL: (**a**) battery power supply and (**b**) battery and solar panel power supply.

**Figure 4.** PCB of the THMDL with battery power supply: (**a**) front side and (**b**) back side.

**Figure 5.** PCB of the THMDL with battery and solar panel power supply: (**a**) front side and (**b**) back side.

**Figure 6.** Schematic of the THMDL: (**a**) battery power supply and (**b**) battery and solar panel power supply.

3.2.4. Hardware Implementation for the ECDL

Similar to the THMDL, the ECDL has two ANs due to the incompatibility of the I2C bus with the LoRaWAN system. In this case, a relay is available to operate the electrovalve. This relay is connected to digital output 3 of AN1. The supply voltage of the relay is therefore 5 V, and it can withstand currents of up to 10 A.

AN2 takes care of the electrical measurements made by the INA219. In the SP versions of the THMDL and ECDL, 3 INA219 m are required. It is, therefore, necessary to assign an address on the I2C bus in order to be able to read the contents of each one. The INA219 output meter of the charge controller is assigned the default address 0 × 40. The battery measurement INA219 was assigned the address 0 × 41. For this reason, it is necessary to make a bridge between the A0 contacts of INA219. Finally, the INA219 m of the SP was assigned the address 0 × 44, bridging, in this case, the two A1 contacts. This allows access to the individual measurements without any interference between the measuring device addresses.

The system for access to the LoRaWAN network via the DLB is the same as explained for the THMDL. The difference is the messages exchanged with the network because the ECDL performs different functions. To understand the functional relationships between the components used in the two versions of the ECDL, see Figure 7.

**Figure 7.** Block diagram of the ECDL: (**a**) battery power supply and (**b**) battery and solar panel power supply.

For the ECDL, it is also necessary to include the wiring diagrams that show the electrical connections to be made in order to build this device. The wiring diagrams are complemented with the block diagrams for a complete definition of the ECDL. With all the information provided, the ECDL is reproducible for any interested researcher. Figure 8 shows the wiring diagrams for the ECDL in its two versions.

**Figure 8.** Wiring diagram of the ECDL: (**a**) battery power supply and (**b**) battery and solar panel power supply.

Finally, Figures 9 and 10 show the electronic board developed for the ECDL in its two versions. In this case, the dimensions of the boards are 125 × 59 and 220 × 70 for the ECDL versions with and without an SP. The electronic schematic of the PCB board for THMDL is shown in Figure 11.

**Figure 9.** PCB of the ECDL with battery power supply: (**a**) front side and (**b**) back side.

**Figure 10.** PCB of the ECDL with battery power supply: (**a**) front side and (**b**) back side.

**Figure 11.** Schematic of the ECDL: (**a**) battery power supply and (**b**) battery and solar panel power supply.

The economic valuation for the ECDL versions has also been done to verify that the reduced-price target has been met. Tables 9 and 10 give the approximate cost of the two ECDL versions.


**Table 9.** Cost of components for the ECDL with a battery power supply.

**Table 10.** Cost of components for the ECDL with a battery and solar panel power supply.


#### *3.3. Software Design*

The system designed in this research is intended to operate continuously in a 24/7 mode. This allows the automated system to be permanently under control. In addition, the devices have to perform all their functionalities again when there is any problem, e.g., battery change.

Several functionalities have been implemented in the system: (i) battery charge level control; (ii) watering routine based on the weather forecast and humidity level; (iii) complete electrical (v, I, p) and environmental (temperature, humidity) measurements; and (iv) parameter change.

#### 3.3.1. THMDL Software

The THMDL program is structured in two main sections: (i) initialisation and (ii) command control. The initialisation tasks must prepare the components used and the communication ports to start the continuous process.

The command control routine must continuously scan the network for messages sent from the system. These messages are of two types: (i) measurement message and (ii) irrigation message. When the measurement message is received, the measurement routine that measures the electrical and environmental parameters is executed. Subsequently, the battery check routine is executed to check the state of charge of the battery. If the message sent is for irrigation, the need to irrigate is checked, and the necessary order is sent to the system.

Once the task required by the system has been performed, a task completed confirmation message is sent so that the system is notified of the completion of the task. Once the confirmation message is sent, the system returns to the initial step of scanning the LoRaWAN network for new messages. The above process for the command control routine is performed continuously as long as the THMDL is connected. Figure 12 shows the flowchart for the main THMDL program.

**Figure 12.** Flow chart of the main program for the THMDL.

Figure 13 shows the flowcharts for the measurement routines in the two versions of the THMDL. The routine is divided into two parts: (i) measurement of electrical variables and (ii) measurement of environmental variables. First, the measurement phase of the electrical part is performed. In this phase, the corresponding INA219 m is called, which returns the variables *v*, *i* and *p*. In the case of the battery-only version, only one measurement is taken. The version with an SP performs three measurements in this order: (i) regulator, (ii) battery and (iii) SP.

**Figure 13.** Flow chart of the measurement routines for the THMDL: (**a**) battery power supply and (**b**) battery and solar panel power supply.

Once the measurement of electrical variables has been completed, the measurement of environmental variables is performed. To do this, SHT30 is called and returns the temperature and humidity data recorded at that moment. Finally, all the measured data are sent to the LoRaWAN network. In this sense, a message security system has been implemented in order to minimise the loss of information in the system.

Figure 14 shows the flow chart of the irrigation routine. It is important to note that this routine has been implemented based on weather forecasts so that water expenditure is minimised. When the routine is started, a request for rainfall forecast data is sent. The forecast can be extended over the necessary time horizon to be estimated in each application. Once the probability of precipitation is received, it is compared with the minimum probability assigned. If the probability received is higher, the message that it is not necessary to irrigate is sent. On the other hand, if the probability of precipitation is lower than the minimum, it is passed to the part of comparison with the measured humidity level.

**Figure 14.** Flow chart of the irrigation routine for THMDL.

If the humidity level is lower than the minimum level set for watering, the watering message is sent. Otherwise, the message that watering is not necessary is sent. The levels of precipitation probability and minimum humidity can be changed by the user at any time, making the system much more efficient and dynamic.

The battery check routine is shown in Figure 15a. Here, it can be seen that the check is performed in relation to the battery voltage. If the above voltage falls below the defined minimum, a low battery message is sent to the system for action by the maintenance staff.

To make the system more dynamic, it is necessary to be able to change the action limits at any time. This allows it to adapt to new situations or approaches in system policies. It also allows sectorisation of the system, making it possible to have different parameters in each zone reflecting the particular characteristics. For this purpose, Figure 15b shows the THMDL parameter change routine. Parameters can be changed together, either individually or in groups, as the routine is prepared for this.

**Figure 15.** Flow chart of the (**a**) check battery routine and (**b**) change parameters routine for the THMDL.

#### 3.3.2. ECDL Software

The design of the main ECDL programme follows a similar philosophy to that of the THMDL. It starts with the initialisation phase of all components. Subsequently, in continuous mode, it executes the following tasks: (i) scanning the LoRaWAN network for new messages; (ii) if a measurement message arrives, it performs the measurement, checks the battery state of charge and sends data; (iii) if a parameter change message arrives, it executes the parameter change routine and sends confirmation; (iv) if the message is an electrovalve on-or-off message, it activates or deactivates the relay. Figure 16 shows the flowchart for the main ECDL program.

**Figure 16.** Flow chart of the main program for the ECDL.

Figure 17 shows the flowcharts of the measurement routines for the two versions of the ECDL. The routines work in the same way as those shown above for the THMDL. In this case, the temperature and humidity measurement part is omitted, as this device does not need to take these measurements. It is also enabled to send messages until receiving confirmation of the arrival of the data.

**Figure 17.** Flow chart of the measurement routines for the ECDL: (**a**) battery power supply and (**b**) battery and solar panel power supply.

As with other routines, the data modification routine is based on the one described for the THMDL. In this case, only two variables are needed: (i) timeout for receiving and sending messages and (ii) minimum voltage for sending low battery warnings. Figure 18 shows the parameter change routine for the ECDL.

**Figure 18.** Flow chart of the change parameters routine for the ECDL.

#### **4. Results and Discussion**

This section shows the tests conducted to check the devices created in this research. Data were collected in Jaén, Andalusia, Spain, during different times of the year in order to validate the design and implementation.

#### *4.1. Case Study*

Figure 19 shows the distribution of irrigation zones on the campus. The number of zones is 22, in which a THMDL device for temperature and humidity measurement has been installed in each zone. An ECDL actuator has also been installed in each zone to control the irrigation electrovalve. This allows the areas to be separated and only irrigate those that really need water. This avoids unnecessary water wastage in areas with sufficient humidity.

**Figure 19.** Distribution of irrigation zones on the campus.

For applications in cities, the necessary zones will be distributed according to the characteristics of the area to be irrigated automatically. In each zone, a THMDL device must be installed to monitor the zone. The THMDL will communicate with the LoRaWAN system, sending all the recorded data and irrigation orders required. As for the ECDL devices, these will be installed in each of the electrovalves that irrigate the automated zones. It may happen that several zones monitored with the THMDL are irrigated with the same electrovalve. Therefore, the number of THDML devices does not always coincide with the number of ECDL devices. Aerial photographs, photographs taken with drones, maps of the area, etc., can be used to perform the study. These tools allow a detailed study of the area to be done and the best possible system to be implemented.

#### *4.2. LoRaWAN Configuration*

In this research, we chose to send messages every minute, which is sufficient for an installation of this type. The initial configuration chosen in this case is BW125 (Band Width 125 kHz), SF7 (Spreading Factor) and CR4/5 (Code Rate). The length of the payload is 16 bytes plus 13 bytes for the header. The 16 bytes of the payload are distributed in 2 bytes for each variable, 2 bytes for temperature, 2 bytes for humidity, 2 bytes for battery voltage, 2 bytes for battery current, 2 bytes for PV module voltage, 2 bytes for PV module current, 2 bytes for bus voltage and 2 bytes for bus current. The header layout is 1 byte for MAC header (MHDR), 4 bytes for LoRaWAN device address, 1 byte for FCtrl (control bit), 2 bytes for Fcont (count bits), 4 bytes for the message integrity code (MIC) and 1 byte for Fport (port number).

Due to the 60 s send time and the chosen payload length, it is only possible to use SFs between 7 and 10, with a 1% duty cycle duration of 41.2 s for SF10 and less for the other SFs. SF7 is chosen as it has the shortest duty cycle duration. The location of the installation is in Europe, so only BW125 and BW250 are possible. The speed of BW250 is higher, but the transmission distance is shorter. To ensure less PLR with the existing distances, a BW125 has been chosen.

Table 11 shows the calculation of the on-air times for the header and payload used. Here, you can see the minimum time for sending messages according to the 1% duty cycle. With the chosen configuration, it is 6.7 s. Table 11 shows all possible combinations for the EU868 zone in Europe. In other regions of the world with different frequency plans, other results can be easily obtained.


**Table 11.** Airtime parameters for the LoRaWAN in EU868 zone.

#### *4.3. Measurement of Soil Temperature and Humidity*

This section shows the temperature and humidity measurements for all days of the four seasons taken in one of the zones in the year 2020. It can be seen that temperature increases and humidity decreases in the seasons of the year. The season with the highest average temperature corresponds to summer with an average of 29.31 ◦C, the lowest average temperature of 12.14 ◦C and the annual average of 19.03 ◦C.

The highest average humidity occurs in winter with a value of 67.81%, the lowest average in summer with 35.40% and the annual average is 56.09%. Figure 20 shows the data obtained during the meteorological stations of the year 2020.

The location of the campus is defined by its UTM coordinates referenced to zone 30: X = 431,582 and Y = 4,182,595. The geographical location of Jaen has a continental Mediterranean climate. As it is located near the Guadalquivir river valley, this has a decisive influence on the climatic conditions. The temperature variation that can occur is around 20 ◦C.

**Figure 20.** Temperature and humidity graphs for the year 2020: (**a**) temperature in winter; (**b**) humidity in winter; (**c**) temperature in spring; (**d**) humidity in spring; (**c**) temperature in summer; (**f**) humidity in summer; (**g**) temperature in autumn; and (**h**) humidity in autumn.

#### *4.4. Battery Charge*

It is possible to charge the battery in two different ways: (i) via the USB port and (ii) via the SP. The first option is possible via the mini USB port of the LiPo.

This port can be connected to various devices, such as a computer USB port, a mobile phone charger, or any other type of charger that provides 5 V and 500 mA DC. Figure 21a,b shows the voltage and current of a complete charging process through the USB port, in this case, of a laptop. From the voltage curve, it can be seen that the voltage increases as the accumulated battery charge increases. The voltage evolves from 2.2 V to 2.6 V at the end of the charging process. The charging current averages 350 mA until the seventh hour of charging. Thereafter, the current decreases to 100 mA at the end of the charging process, which drops abruptly to zero when the battery is fully charged. The complete charging process takes 10 h and 50 min.

**Figure 21.** Electrical variables measurement in battery charge: (**a**) battery voltage in USB charge; (**b**) battery current in USB charge; (c) battery voltage in PV charge; (**d**) battery current in PV charge (**e**) PV voltage in PV charge; and (**f**) PV current in PV charge.

Figure 21c–f shows the battery charging process using the SP. It can be seen that although the SP is live, LiPo only switches on the battery charging when the SP voltage is close to 4V. The battery charge remains constant at 3 V, with some periods of 2 V. The charging current depends on the radiation that the SP is receiving. A voltage drop is also observable around 10 am due to passing clouds.

As can be seen, charging the battery with a USB is much faster and more recommendable when the battery is discharged. With a full day of charging with the SP, the battery was not fully charged. This affirms that the function assigned to the SP is to extend the duration of the battery charge by providing charging during sunshine hours. Thus, the SP replenishes the energy consumed during the night and makes the equipment autonomous for long periods of time without the need to charge the battery.

#### *4.5. Battery Discharge*

The full discharge test has been performed on a fully charged battery. Figure 22 shows the results for the voltages and currents of the battery and at the output of the LiPo controller board. The time required for full discharge was 166.5 h, which ensures a week of operation with only one battery charge without using an SP.

The battery voltage remains constant at 3 V until the point of full discharge. On the other hand, the regulator card maintains an output voltage between 4 and 5 V. The average current consumption is around 33 mA. In this case, it can be seen that the regulator card maintains an output current of between 204 and 210 mA until the end of the charge.

**Figure 22.** Discharge measurement: (**a**) battery voltage; (**b**) battery current; (**c**) LiPo out voltage; and (**d**) LiPo out current.

#### *4.6. Energy Consumption Comparative*

In addition to the wide coverage provided by the LoRaWAN equipment, there is the advantage of the reduced power consumption of these devices. At this point, the consumption of the developed equipment was tested in relation to other wireless technologies.

The comparison was conducted during the month of January 2020 with data taken from THMDL and a Wi-Fi device. The Wi-Fi device tested was an Arduino Wemos D1 mini [84] without a connection to any of the electrical sensors, temperature and humidity sensors or the drive relay. The use of these components would increase the power consumption of the device. It should be noted that the Wemos D1 mini board is one of the boards with the lowest power consumption among those with Internet access via Wi-Fi and the ESP8266 chip. Other boards with this chip have higher power consumption, such as NodeMCU [85], Wemos D1 mini pro [86], Wemos D1 R1 [87], etc.

Figure 23 shows the result of the voltage and current measurements on the battery at the output of the LiPo board. It can be seen that the power consumption of the Wemos D1 mini is approximately three times higher than that of the THMDL, which would be increased by adding the sensors. In addition to this high consumption, the necessary Wi-Fi repeaters or routers would have to be added, as the Wi-Fi coverage is much lower than that offered by the LoRaWAN network, which would increase the final consumption of the whole.

The average THMDL consumption is 33.02 mA, with a standard deviation of 1.76. Wemos D1 mini has a mean of 98.01 mA and a standard deviation of 2.28. The energy consumed by THMDL in January 2020 is 73.6329 Wh and 218.61 Wh for Wemos D1 mini. The total energy consumed in 2020 by THMDL was 869.0219 Wh, corresponding to 2.3762 Wh/day.

As for the output of the LiPo card, it can be seen that, as mentioned above, it maintains an output of between 204 and 210 mA, regardless of the battery consumption. In view of the consumption results, together with the wireless coverage, this supports the use of LoRaWAN technology in systems such as the one developed in this research.

**Figure 23.** Consumption comparative in January 2020: (**a**) battery current with THMDL connected; (**b**) battery current with Arduino Wemos D1 mini connected; (**c**) LiPo current out with THMDL connected; and (**d**) LiPo current out with Arduino Wemos D1 mini connected.

Statistics for the annual consumption of THMDL have been done. Since the average consumption is 33.02 mA and almost constant, the mean is almost equal in all months at around 2.3750, and a standard deviation of 0.0026 is almost zero. The skewness is practically close to zero, with some positive and negative values but in the region of zero, indicating a symmetrical distribution curve. On the other hand, the kurtosis reflects a mesokurtic distribution with a curve with uniformly distributed values in the symmetrical distribution. Table 12 shows the results obtained.


**Table 12.** Descriptive statistics of the power THMDL consumption in year 2020.

#### *4.7. Solar Energy Generated*

This section shows the power and energy generated by the 3 W SP used in THMDL and ECDL. The data were collected during 2020 from one of the THMDLs with SP. Figure 24 shows the energy and power for the month of January 2020, which is the lowest generation month together with December and for the whole year.

**Figure 24.** Consumption comparative: (**a**) power obtained in January; (**b**) energy obtained in January; (**c**) power obtained in the year 2020; and (**d**) energy obtained in the year 2020.

The energy generated in January is 202.19 Wh, with a daily average of 6.5266 Wh. The annual photovoltaic generation amounts to 4594.73 Wh and a daily average of 12.5639 Wh. Table 13 summarises the statistical results of the annual empirical distributions for SP power generation. To compare the results, the monthly and annual average was taken. Thus, the month of maximum generation is June with a mean pf 18.9210 Wh/day. On the other hand, December has the lowest generation with a mean of 6.2873 Wh/day. The generation in December is 33.22% over June and 50.04% over the annual average.


**Table 13.** Descriptive statistics of the power SP generation in year 2020.

Positive skewness values indicate that the tail of the distribution is longer on the right for values above the mean, and the values are concentrated more to the left of the mean, with only January, November and December located to the right of the mean.

The months of January, November and December present a leptokurtic kurtosis because their coefficient is greater than 3, indicating that the values are concentrated around the mean. The rest of the months are mesokurtic because they have a coefficient lower than 3 and their values are further away from the mean.

#### *4.8. Analysis of Consumption, Photovoltaic Generation and Battery Life*

Using the data in Tables 12 and 13, a comparison can be made between the energy consumed by THMDL and SP. Starting from the month of lowest generation, which is December with 194.77 Wh and 6.2873 Wh/day and comparing them with the THMDL consumption of 73.7580 Wh and 2.3808 Wh/day, it can be concluded that the energy generated in the most unfavourable month covers the total daily consumption required. The ratio of generation to consumption is 2.64 times higher.

Considering the results obtained for the total discharge of the battery with an average of 33 mA, 166.5 h were needed, which is much longer than the time needed to recover the sunlight the next day and generate the energy consumed during the hours of no photovoltaic generation.

In this sense, fully discharging the battery would take 6.9375 days, corresponding to 1 day 14.41% of the total charge of the battery in the absence of sunlight. As in the most unfavourable month, there are 6.2873 Wh/day of generation and 2.3808 Wh/day are consumed, the PV generation largely covers the maximum of 14.41% of the battery charge consumed. In this way, the lifetime of the battery is extended to a large extent, as complete charge and discharge cycles are not necessary.

#### *4.9. LoRaWAN Measurements*

The LoRaWAN was implemented with the network optimisation algorithm developed by Cano-Ortega et al. [8]. The algorithm allows adapting the network parameters in real time in order to obtain the smallest possible ratio of lost packets so as to minimise the loss of information. The algorithm was implemented in the Raspberry that supports the LoRa concentrator.

Figure 25 shows a part of the measurements made on the network. Two hours of measurements are shown. In the graphs, one can see the changes made by the algorithm marked by the points in the graph, where one can see the change of parameters and the reduction in the rate of loss of information. It should be noted that the location of the devices with respect to the concentrator has a decisive influence on the rate of packets lost. The THMDL device is closer to the concentrator than the ECDL device shown. As can be seen, this has a clear influence on the rate of lost data.

**Figure 25.** LoRaWAN PLR: (**a**) THMDL measurement and (**b**) ECDL measurement.

#### *4.10. ThingSpeak Integration*

As discussed in Section 3.1, data are sent to TTN, and from there, they can be derived to multiple services operating in the cloud. A large number of possible integrations supported by TTN are available. Among them, the integration with the MathWorks ThingSpeak service was chosen to be shown in this research. ThingSpeak allows sending information in its free version of up to four channels of eight fields with a data latency of 15 s. If the needs of the system are greater, it is possible to switch to the paid version, which allows latency times to be reduced to 1 s.

The configuration chosen for the LoRaWAN has the following parameters: BW125, SF7, CR4/5, message header 13 bytes and 16 bytes payload. As can be seen in Table 11, the

minimum information sending time is 6.7 s, calculated from the European Telecommunications Standards Institute (ETSI) standard [88], complying with the 1% maximum duty cycle rule. As a data sending time of 60 s was chosen, it complies with the current regulations. Figure 26 shows three examples of integration: (a) temperature and humidity data collection; (b) electrical variables of battery charging in USB mode; and (c) data collection of the electrical variables of the THMDL in the battery-only version.

**Figure 26.** ThingSpeak Integration: (**a**) temperature and humidity data; (**b**) battery USB charge; and (**c**) THMDL battery power supply operation.

#### *4.11. Future Work*

The high power of LoRaWAN is limited by the 1% duty cycle time for sending data, which for the payload used in this research is 6.7 s, together with the limitation of sending data to ThingSpeak every 15 s in the free version may limit the message sending. If the payload is increased with more sensors added to THMDL or ECDL, the minimum time for sending data would increase. In this sense, a future line of research would be to develop a LoRa concentrator that works outside the LoRaWAN specification using technology that avoids data delivery limitations for systems with data latency of less than 1% of the duty

cycle. Complementing the previous line, higher data latency systems should be used in the cloud, such as Google's Firebase, which allows up to 0.2 s of data upload time.

It would also be interesting to create a web page and an application for mobile devices where the monitored and controlled installations are collected in real time. Finally, further studies should provide algorithms with machine learning intelligence to improve its performance through the experiences collected during the operation of the installations.

#### **5. Conclusions**

This research develops a complete irrigation system based on wireless communication over a LoRaWAN. It meets the objectives of low power consumption, small size, integration, modular design, fault response, operational safety and low price. These objectives have been achieved by overcoming a number of technical challenges, including component selection, modular design, evaluation of alternatives and PCB design to integrate the components used in the THMDL and ECDL.

LoRaWAN has a wide coverage of up to 10 km, with coverage in urban environments reaching up to 5 km. In addition, up to 1000 devices can be integrated with a single gateway or hub, reducing the infrastructure to be installed. If Wi-Fi, Bluetooth, etc., devices were used, a multitude of repeater devices would be required, which would greatly complicate the complexity of the installation. On the other hand, the power consumption of LoRaWAN devices is extremely low compared to Wi-Fi and similar devices. This reduced power consumption increases battery life and extends system uptime.

The system incorporates battery management for low battery warnings with an adjustable warning level. The irrigation routine allows the minimum moisture level for watering to be set. This routine also incorporates a rainfall forecast query that offers the possibility of not watering if the probability of rainfall is higher than the set value and no watering message is sent. The system is equipped with redundant message sending, which minimises the loss of information in the system. On the other hand, all minimum battery voltage levels, minimum rain probability, minimum humidity and waiting time for receiving and sending messages can also be set. This makes for a dynamic, robust and fault-tolerant system that can be installed in a multitude of locations.

A comparison of the prototypes used with other wireless technologies was performed. In this sense, the average consumption of THDML is 2.3762 Wh/day, and the average consumption of the Wi-Fi device studied is 7.0519 Wh/day without sensors and other components, which is 2.96 times higher. The generation of the SP used is 6.2873 Wh/day in the month of lowest generation, which is well within the THMDL average consumption of 2.3808 Wh/day, which is 14.41% of the battery capacity. This contributes to decreasing the charge and discharge cycles of the battery and extending the battery life.

The use of TTN opens up a wide range of possibilities for the development of system functionalities and adaptation to the needs of each implementation. TTN integrates a large set of cloud services, such as the one presented at the end of the Results section (ThingSpeak). In each implementation of the system, the needs of each problem can be studied, and the most suitable service can be used to offer the best solution. On the other hand, the system can also be reprogrammed by adding new functionalities to improve its performance characteristics. By using Arduino as the basis for the devices, the system benefits from the advantages of the open-source platform of this family.

**Author Contributions:** Conceptualization, F.S.-S. and A.C.-O.; methodology, F.S.-S. and A.C.-O.; software, F.S.-S. and A.C.-O.; validation, F.S.-S. and A.C.-O.; formal analysis, F.S.-S. and A.C.-O.; investigation, F.S.-S. and A.C.-O.; resources, F.S.-S. and A.C.-O.; data curation, F.S.-S. and A.C.-O.; writing—original draft preparation, F.S.-S. and A.C.-O.; writing—review and editing, F.S.-S. and A.C.-O.; visualization, F.S.-S. and A.C.-O.; supervision, F.S.-S. and A.C.-O.; project administration, F.S.-S. and A.C.-O.; funding acquisition, F.S.-S. and A.C.-O. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Acknowledgments:** The authors would like to thank the Department of Electrical Engineering of the University of Jaen for allowing the use of their laboratories and material in the development of this research.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**


#### **References**


### *Article*
