Next Article in Journal
Acknowledgement to Reviewers of Journal of Sensor and Actuator Networks in 2017
Next Article in Special Issue
Sensors and Actuators in Smart Cities
Previous Article in Journal / Special Issue
Development of Intelligent Core Network for Tactile Internet and Future Smart Systems
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Bayesian-Optimization-Based Peak Searching Algorithm for Clustering in Wireless Sensor Networks

Graduate School of Applied Informatics, University of Hyogo, Computational Science Center Building 5-7F 7-1-28 Minatojima-minamimachi, Chuo-ku Kobe, Hyogo 6570047, Japan
*
Author to whom correspondence should be addressed.
J. Sens. Actuator Netw. 2018, 7(1), 2; https://doi.org/10.3390/jsan7010002
Submission received: 31 October 2017 / Revised: 25 December 2017 / Accepted: 29 December 2017 / Published: 2 January 2018
(This article belongs to the Special Issue Sensors and Actuators in Smart Cities)

Abstract

:
We propose a new peak searching algorithm (PSA) that uses Bayesian optimization to find probability peaks in a dataset, thereby increasing the speed and accuracy of clustering algorithms. Wireless sensor networks (WSNs) are becoming increasingly common in a wide variety of applications that analyze and use collected sensing data. Typically, the collected data cannot be directly used in modern data analysis problems that adopt machine learning techniques because such data lacks additional information (such as data labels) specifying its purpose of users. Clustering algorithms that divide the data in a dataset into clusters are often used when additional information is not provided. However, traditional clustering algorithms such as expectation–maximization (EM) and k - m e a n s algorithms require massive numbers of iterations to form clusters. Processing speeds are therefore slow, and clustering results become less accurate because of the way such algorithms form clusters. The PSA addresses these problems, and we adapt it for use with the EM and k - m e a n s algorithms, creating the modified P S E M and P S k - m e a n s algorithms. Our simulation results show that our proposed P S E M and P S k - m e a n s algorithms significantly decrease the required number of clustering iterations (by 1.99 to 6.3 times), and produce clustering that, for a synthetic dataset, is 1.69 to 1.71 times more accurate than it is for traditional EM and enhanced k - m e a n s ( k - m e a n s ++) algorithms. Moreover, in a simulation of WSN applications aimed at detecting outliers, P S E M correctly identified the outliers in a real dataset, decreasing iterations by approximately 1.88 times, and P S E M was 1.29 times more accurate than EM at a maximum.

1. Introduction

Over the past decade, wireless sensor networks (WSNs) have been widely applied in applications that involve analyzing collected data to improve quality of life or secure property. For example, sensor nodes are present in homes, vehicle systems, natural environments, and even satellites and outer space. These sensors collect data for many different purposes, such as health monitoring, industrial safety and control, environmental monitoring, and disaster prediction [1,2,3,4]. In such WSN applications, sensing data can be manually or automatically analyzed for specific purposes. However, in the age of big data, an increasing amount of sensing data is required for precise analysis in the WSN applications. Consequently, it is difficult or, in some cases, even impossible to manually analyze all of the collected data.
There are several conventional ways to automatically manage the collected data. The most typical and the easiest method is to set threshold values that correspond to sensing events. Events are triggered once the data exceed these thresholds. However, the thresholds in large-scale WSNs vary, and change due to environment changes. Moreover, precise analysis results cannot be obtained through the use of thresholds alone.
A complementary approach uses supervised machine learning. In this approach, a model is trained that can categorize sensing data into the different states required by an application. However, because sensing data labels are required in the training phase, extra work is required to manage the data. This process is particularly difficult when the dataset is large. Moreover, if the sensing environment changes, certain labels must also change. It is difficult to maintain a functional model under conditions where labels change frequently; this affects the analysis results.
Unsupervised machine learning methods are feasible and well-studied, and are not associated with the data labeling problems described above. Clustering is an important and common method in such approaches. In clustering, the overall features of the dataset are extracted. Then, the data are divided into clusters according to their features. As a result, data labeling is not required, and the data-labeling difficulties that occur in supervised approaches can be avoided. However, in state-of-the-art clustering methods such as the e x p e c t a t i o n - m a x i m i z a t i o n ( E M ) [5] and k-means [6] algorithms, a massive number of iterations must be performed in order to form clusters, and a significant amount of computation time is required. Furthermore, because these algorithms use random starting data points as initial center points to form clusters, and because the number of clusters is not precisely determined, the clustering results become less accurate. To address these problems, in this paper, we propose a peak searching algorithm ( P S A ) for improving clustering algorithm capabilities.
Our approach should be applicable to different dataset distributions. Therefore, the collected sensing dataset is considered to be generated by a Gaussian mixture model composed of several different Gaussian distributions. If the number of Gaussian distributions and appropriate initial center points are known, clustering algorithms can appropriately divide the dataset into different clusters because each Gaussian distribution corresponds to a cluster. The proposed P S A employs a Bayesian optimization (BO) strategy that uses a Gaussian process [7]. Bayesian optimization is typically used for hyper-parameter optimizations; to the best of our knowledge, our approach is the first to use BO to improve clustering. Moreover, other Bayesian theorem based algorithms, such as [8,9,10,11], are also appropriate optimization strategies for training online and offline machine learning algorithms.
Given a collected dataset, the P S A searches for the data points with the highest probability values (i.e., peaks in the dataset). A Gaussian distribution peak is a point that corresponds to the mean. By searching the peaks, we can obtain appropriate initial center points of Gaussian distributions, hence, the corresponding clusters. This method overcomes the difficulties associated with the hard determination of starting data points in traditional cluster algorithms, thereby reducing the number of iterations. By using the P S A , cluster algorithms can form clusters using peak points instead of random starting points, which improves the clustering accuracy.
We used simulations to investigate the potential of the proposed P S A for improving algorithm performance. To measure performance improvements, we applied the P S A to the E M and k - m e a n s algorithms. We refer to these modified algorithms as P S E M and P S k - m e a n s , respectively. The simulation results showed that, for P S E M and P S k - m e a n s , the required numbers of clustering iterations were significantly reduced by 1.99 to 6.3 times. Additionally, for synthetic datasets, clustering accuracy was improved by 1.69 to 1.71 times relative to the traditional E M and enhanced version of k - m e a n s , i.e., k - m e a n s ++ [12].
The proposed method can accurately group data into clusters. Therefore, any outliers in a dataset can be clustered together, making them possible to identify. Because outliers obviously reduce the capabilities of the WSN applications, we also conducted a simulation using a real WSN dataset from the Intel Berkeley Research lab (Berkeley, CA , USA). This allowed us to compare the outlier-detection capabilities of P S E M and E M . Our simulation results showed that P S E M correctly identified outliers, decreased iterations by approximately 1.88 times, and improved accuracy by 1.29 times at a maximum.
The remainder of this paper is organized as follows. Section 2 outlines related works, while Section 3 introduces BO. Section 4 describes the proposed P S A and Section 5 presents the simulation results. Section 6 presents a discussion of this work. Section 7 summarizes key findings, presents conclusions, and describes potential future work.

2. Related Works

This section describes the techniques used in the clustering algorithms, which are used to automatically divide a collected dataset into different clusters. There are two main types of clustering approaches. The first is based on parametric techniques. To cluster a dataset, the parameters of the statistical model for a dataset must be calculated. The E M algorithm is a parametric technique. The second type of clustering approach uses non-parametric techniques, in which the calculated parameters of a statistical model are not required for clustering a dataset. k - m e a n s and k - m e a n s ++ are non-parametric techniques.
We describe the two clustering approaches in the following subsections. Moreover, because outlier detection is critical in the WSN applications, we describe some relevant outlier detection approaches.

2.1. Parametric Techniques

Parametric techniques assume that a dataset is generated from several parametric models, such as Gaussian mixture models. The clustering process is conducted by calculating the parameters of each Gaussian model, and assuming that data points in the same cluster can be represented by the same Gaussian model. Usually, a Gaussian model is chosen as the default model because it conforms to the central limit theorem [13,14] parametric techniques used. From the collected dataset, they calculated detailed a priori estimates of statistical parameters for the assumed statistical model (for example, the mean, median, and variance). This allowed them to fit statistical models.
EM [5] is a famous and widely used algorithm for clustering datasets using parametric techniques. The E M algorithm first calculates responsibilities with respect to given parameters (means and variances). This is referred to as the E-step. Then, the E M algorithm uses the responsibilities to update the given parameters. This is referred to as the M-step. These two steps are iteratively executed until the parameters approach the true parameters of the dataset. When those parameters are determined, the Gaussian models in the Gaussian mixture model are fixed. Therefore, clustering can be accomplished using the Gaussian models.
There are many benefits associated with parametric techniques: (i) such techniques assign a probability criterion to every data point to determine whether or not it belongs to a cluster; and (ii) such techniques do not require additional information (for example, labels on data points that indicate their statuses). On the other hand, parametric techniques cannot be deployed in a distributed way because a significant number of data points are required to estimate the mean and variance. Thus, methods that use parametric techniques are deployed in a centralized way.

2.2. Non-Parametric Techniques

Some algorithms use non-parametric techniques, which cluster datasets without using statistical models. Non-parametric techniques make certain assumptions, such as density smoothness. Typical methods use histograms, as in [15,16,17]. Histogram-based approaches are appropriate for datasets in low-dimensional spaces because the calculations in histogram-based techniques have an exponential relationship with the dimensions of a dataset. Therefore, this type of approach has low scalability to problems with larger numbers of data points and higher-dimensional spaces.
One typical non-parametric cluster algorithm is k-means [6]. In k-means, when candidate cluster centers are first provided to the algorithm, the number of centers is equal to the number of clusters. Then, k - m e a n s is used to calculate the sum of the distances from the center of each cluster to every data point. These two steps are iteratively executed, and k - m e a n s updates the given cluster centers by minimizing the calculated sum. When cluster centers are determined, clusters are formed. However, k - m e a n s cannot guarantee that the candidate centers will be close to the true cluster centers. The iterations and clustering accuracy of the algorithm are not satisfying.
To overcome the disadvantages of k - m e a n s , Arthur and Vassilvitskii [12] proposed k - m e a n s ++, which is based on the k - m e a n s algorithm. k - m e a n s ++ and k - m e a n s are different because k - m e a n s ++ uses the number of k values to execute a calculation that identifies the appropriate data points to use as the initial centers. In contrast, in the k - m e a n s algorithm, the initial centers are randomly selected, which increases the number of clustering iterations. Therefore, k - m e a n s ++ requires fewer iterations than k - m e a n s .
In conclusion, there are disadvantages associated with the use of both parametric and non-parametric techniques in the WSNs. Parametric techniques can only estimate a model when sufficient data is available, and they are therefore difficult to use in a distributed way. While non-parametric techniques can be executed in a distributed way in the WSNs, they cannot provide a probability criterion for detection. Moreover, both techniques require a massive number of iterations to form clusters and use random starting data points. These require significant computing power and have low accuracy.

2.3. Outlier Detection in WSN Applications

Outliers are very common in collected datasets for two reasons. First, sensor nodes are vulnerable to failure because the WSNs are often deployed in harsh environments [18,19,20,21]. Outliers are commonly found in datasets collected by the WSNs installed in harsh environments [22,23]. Second, noise in wireless signals and malicious attacks both create outliers [24,25], which obviously reduce the WSN capabilities.
The clustering methods are also used for outlier detection in the WSN applications. For instance, to robustly estimate the positions of sensor nodes, Reference [26] used the E M algorithm to iteratively detect outlier measurements. The E M algorithm was used to calculate variables that could indicate whether or not a particular measurement was an outlier. Reference [27] conducted similar work using E M algorithms to detect outliers. Additionally, Reference [28] proposed a novel flow-based outlier detection scheme based on the k - m e a n s clustering algorithm. This method separated a dataset containing unlabeled flow records into normal and anomalous clusters. Similar research by [29] used k - m e a n s to detect heart disease. However, approaches using E M and k - m e a n s to detect outliers suffer from the previously mentioned problems of clustering iteration and accuracy. The approach that we introduce later in this paper can solve such problems.

3. Bayesian Optimization

Before a dataset can be divided into clusters, the starting data points of clusters in the dataset must be determined. In particular, the number of peak points (a peak point is a data point corresponding to the maximum probability) in a dataset corresponds to the number of clusters. In this study, we use BO to identify peak points. Typically, we do not know the form of the probability density function p ( x ) . Nevertheless, we can obtain the approximate value f ( x ) of p ( x ) at data point x , with some noise. For example, we can approximately compute the density of a certain volume. This density is an approximate value of the probability density (see Section 4). However, obtaining the maximum density can be computationally expensive because of the large number of data points. To reduce computation costs, we used BO [30,31], a very powerful strategy that fully utilizes prior experience to obtain the maximum posterior experience at each step. This allows the maximum density to be approached. Thus, fewer data points are required to obtain the maximum density. In the following subsection, we introduce the Gaussian process used in BO.

3.1. Gaussian Process

In BO, a Gaussian process (GP) is used to build a Gaussian model from the provided information. The model is then updated with each new data point. Assume that a set of data points contains t elements: { x 1 , x 2 , , x t } . We use the notation x 1 : t to represent the set of data points. Each of these points exists in a D - d i m e n s i o n a l space. An example data point is x i = ( x i 1 , , x i D ) .
There is an intuitive analogy between a Gaussian distribution and a GP. A Gaussian distribution is a distribution over a random variable. In contrast, the random variables of a GP are functions. The mean and covariance are both functions. Hence, function f ( x ) follows a GP and is defined as follows:
f ( x ) GP ( m ( x ) , k ( x , x ) ) ,
where m ( x ) is the mean function, and k ( x , x ) is the kernel function of the covariance function.
Suppose that we have a set of data points x 1 : t and their corresponding approximate probability density { f ( x 1 ) , f ( x 2 ) , , f ( x t ) } . We assume that function f ( x i ) can map a data point x i to its probability density p ( x i ) with some noise. For concision, we will use f 1 : t to represent the set of functions for each data point { f ( x 1 ) , f ( x 2 ) , , f ( x t ) } . For the collected dataset, D 1 : t = { ( x 1 , f 1 ) , ( x 2 , f 2 ) , , ( x t , f t ) } is the given information. For convenience, we assume that D 1 : t follows the GP model, which is given by an isotropic Gaussian N 0 , K whose initial mean function is zero and covariance function is calculated using K , as follows: ( k ( x i , x j ) consists of the kernel functions)
K = k x 1 , x 1 k x 1 , x t k x t , x 1 k x t , x t
Once, we have calculated K , we build a GP model from the information provided.
A new data point x t + 1 also follows f t + 1 = f ( x t + 1 ) . According to the GP properties, f 1 : t and f t + 1 are jointly Gaussian:
f 1 : t f t + 1 = N 0 , K k k T k x t + 1 , x t + 1
where
k = k x t + 1 , x 1 k x t + 1 , x 2 k x t + 1 , x t
Moreover, we want to predict the approximate probability density f t + 1 of the new data point x t + 1 . Using Bayes’ theorem and D 1 : t , we can obtain an expression for the prediction:
P f t + 1 | D 1 : t , x t + 1 = N μ t ( x t + 1 ) , σ t 2 ( x t + 1 )
where
μ t x t + 1 = k T K 1 f 1 : t , σ t 2 x t + 1 = k ( x t + 1 , x t + 1 ) k T K 1 k
We can observe that μ t and σ t 2 are independent of f t + 1 and that we can calculate f t + 1 using the given information.

3.2. Acquisition Functions for Bayesian Optimization

Above, we briefly describe how to use the given information to fit a GP and update the GP by incorporating a new data point. At this point, we must select an appropriate new data point x i + 1 to use to update the GP, so that we can obtain the maximum value of f ( x i + 1 ) . To achieve this, we could use BO to realize exploitation and exploration. Here, exploitation means that we should use the data point with the maximum mean in the GP because that point fully uses the given information. However, this point cannot provide additional information about the unknown space. Exploration means that a point with a larger variance in the GP can provide additional information about the unknown area. The acquisition functions used to find an appropriate data point are designed on the basis of exploitation and exploration. There are three popular acquisition functions: probability of improvement, expectation of improvement, and upper confidence bound criterion.
The probability of improvement (PI) function is designed to maximize the probability of improvement over f ( x + ) , where x + = argmax x i x 1 : t f ( x i ) . The resulting cumulated distribution function is:
P I ( x ) = P f ( x ) f ( x + ) + ξ = Φ μ ( x ) f ( x + ) ξ σ ( x )
where ξ is the exploration strength, which is provided by the user.
The expectation of improvement (EI) is designed to account for not only the probability of improvement, but also the potential magnitude of improvement that could be yielded by a point. The EI is expressed as
E I ( x ) = μ ( x ) f ( x + ) ξ Φ ( Z ) + σ ( x ) ϕ ( Z ) i f σ ( x ) > 0 0 i f σ ( x ) = 0
Z = μ ( x ) f ( x + ) ξ σ ( x ) , i f σ ( x ) > 0 0 , i f σ ( x ) = 0
The upper confidence bound (UCB) criterion uses the confidence bound, which is the area representing the uncertainty between the mean function and variance function in Equation (3). The UCB is compared with the other two acquisition functions, and is relatively simple and intuitive. In detail, it directly uses the mean and variance functions obtained from the given information. A potential new data point is presented by the sum of (i) the mean function, and ( i i ) a constant ν times the variance function. That is, given several potential new data points, the data point with the largest UCB will be selected as the next new data point. Moreover, ν , which is greater than 0, indicates how many explorations are expected. The UCB formula is
UCB ( x ) = μ ( x ) + ν σ ( x )
These three acquisition functions are suited to different datasets, and allow us to obtain an appropriate new data point. The BO algorithm (Algorithm 1) is shown below.
Algorithm 1: BO
Jsan 07 00002 i001

4. Peak Searching Algorithm

In this section, we first introduce some preliminary information related to our proposed algorithm. Then, we explain the algorithm.

4.1. Preliminary Investigations

In most cases, the environment can be represented as a collection of statuses that indicate whether or not certain events have occurred. Such events include fires, earthquakes, and invasions. The data points collected by the sensor nodes contain measurements that describe the statuses of these events. One can assume that the collected dataset is generated by a Gaussian mixture model (GMM) because the data points contained in the dataset are collected from the normal environment, or from natural events. Thus, before fitting a GMM, it is necessary to clarify the peaks of the GMM because each peak is a point that has the largest probability corresponding to a Gaussian distribution. Therefore, we need to know the probability of each data point when we search for the dataset peaks. Although the probability density function is unknown, it can be approximated using alternative methods, which are shown as follows.
One type of method assumes that the set of data points exists in a D - d i m e n s i o n a l space. The probability of data point x can then be approximated as follows: ( i ) set x as the center of a volume with side h. Figure 1 shows an example of a volume in 3D space, where the length of each side is h; and ( i i ) the density of the volume with center x , calculated using Equation (8) [31], is approximately equal to the probability at data point x . The density p ( x ) in this formula depends on the length of side h in the volume and the number T (that is, the number of neighbors of data point x in the volume). N is the total number of data points in the dataset and h D is the size of the volume. Thus, to search for the peaks, we must calculate the densities of all of the different data points using Equation (8). However, this is computationally expensive:
p x = T N h D .
Another method fixes h and applies a kernel density estimator [31]. In this case, the probability of data point x can be calculated as
p ( x ) = 1 N h D i = 1 T K x x i h ,
where K ( ) is the kernel function and T is the number of data points in a volume with side h. Then, the largest value of p ( x ) occurs along the gradient of Equation (9), which is
p ( x ) = 1 N h D i = 1 T K x x i h .
By setting Equation (10) equal to zero, we can calculate the point along the gradient that has the largest p ( x ) . With this method, we do not need to search through the unimportant data points, which reduces the time required to identify peaks. However, Equations (9) and (10) are difficult to solve. Moreover, the length of side h affects the peak search results. Firstly, it supposes that all of the volumes have the same size because they have the same h. Secondly, an inappropriate h value will lead to an incorrect result. In particular, h values that are too large cause over-smoothing in high-density areas, while h values that are too small cause significant noise in low-density areas. To overcome these shortcomings, we introduce the P S A , which we describe in the following subsection.

4.2. The Algorithm

We propose a peak searching algorithm ( P S A ) that does not consider parameter h. We will use simulations to investigate the details of the P S A , which can be used to improve the speed and accuracy of clustering algorithms such as E M and k - m e a n s .
In Equation (10), x x i h is a vector that starts at point x and ends at neighboring point x i . Because a kernel function is used to calculate the inner product of the vector, in this case, the inner product is equal to the vector mode. Moreover, it calculates the largest p ( x ) and the location of data point x where x on the vector at 1 N h D times the mode of the vector. Therefore, the largest probability for finding the peak lays on this vector. This allows us to concentrate only on the vector, without considering constants 1 N h D and h. Hence, we propose using V x to represent the vector in the P S A as
V x = i = 1 T ( x x i ) i = 1 T ( x x i ) .
In Equation (11), only V x is searched. A significant amount of non-important space is not searched. However, many probabilities must be calculated along V x . Moreover, because there are too many data points on the vector V x , it becomes impossible to search for the best data point with the largest probability in a limited amount of time. Hence, we apply BO when searching for the largest probability along V x . BO optimizes the method for searching the maximum probability value mentioned in Algorithm 1. However, as we mentioned in Section 3, the form of probability function p ( x ) is not known, and it can instead be represented by an approximate probability function, which is f ( x ) in Algorithm 1 line 4. Therefore, in this paper, we use Equation (8) to calculate the approximate probability function, which we use in the proposed algorithm. Equation (8) is simpler and more practical for finding dataset peaks. The following describes the details of the proposed P S A .
Next, we will explain how the P S A works in accordance with Algorithm 2. The initializing step requires a number of starting data points from which to begin the search for peaksbecause the dataset may contain multiple peaks. Therefore, the P S A randomly selects M starting points, { x ( 1 ) , x ( 2 ) , , a n d x ( M ) } . For convenience, we will use starting point x ( j ) to describe the details of the method. Vector x ( j ) is calculated using Equation (11) in line 1. The peak searching process shown in Figure 2 contains four steps. In Step 1, the P S A uses Algorithm 1 to search for the peak. That is, data point x i ( j ) , which has a maximum probability along V x ( j ) . The probability denoted by p x i ( j ) is calculated using Equation (8) as shown in line 4. In Step 2 in line 5, a new vector V x i ( j ) is calculated on the basis of x i ( j ) and its T neighboring data points. In Step 3, the method searches for the peak x i + 1 ( j ) along V x i ( j ) in line 6. Notice that data points x i ( j ) and x i + 1 ( j ) are possible dataset peaks. Step 4 starts from line 7 to 14, and the method repeats these steps until the difference between p x i ( j ) and p x i + 1 ( j ) gets close enough to zero. At this point, data point x i + 1 ( j ) is selected as a dataset peak. The same four steps are used with the other starting data points to identify all peaks in the dataset.
Algorithm 2: PSA
Jsan 07 00002 i002

5. Simulation and Analysis

In this section, we investigate the efficiency of the proposed P S A . Because the P S A is a method for improving clustering algorithms, we must use it in state-of-the-art clustering algorithms to evaluate the extent to which the P S A can improve those algorithms. As mentioned in Section 2, E M and k - m e a n s are common clustering algorithms. Here, variations of those algorithms using the P S A are referred to as P S E M and P S k - m e a n s , respectively. In P S E M and P S k - m e a n s , the P S A first searches the peaks of the collected dataset. Then, E M and k - m e a n s use the obtained peaks as the initial starting points to start clustering. In the simulations, we assume that the collected datasets follow GMMs, and that the number of peaks found by the P S A is equal to the number of Gaussian distributions.
We conducted simulations using synthetic datasets and a real dataset. In simulations with synthetic datasets, we compared the accuracies and iterations of P S E M and P S k - m e a n s with those of the original E M ( O E M ), k - m e a n s , and k - m e a n s ++ algorithms. Moreover, because recall and precision are important evaluation indicators, we also used the simulations to compare recalls and precisions. In the simulation using a real dataset, we simulated our methods in order to detect outliers. Because a real dataset could be either isotropic or anisotropic, and because k - m e a n s has a weak effect on anisotropic datasets, we only compared P S E M to O E M for the real dataset.

5.1. Simulation on Synthetic Datasets

5.1.1. Synthetic Dataset

We generated two synthetic datasets, whose data points contained two features. Each dataset was generated using a GMM that contained two different Gaussian distributions. The Gaussian distributions in the first dataset were isotopically distributed; their true peaks (means) were ( 1 , 1 ) and ( 2 , 2 ) and their variances were 0.6 and 0.5 , respectively. The Gaussian distributions in the second synthetic dataset were transformed using the following matrix to create anisotropically distributed datasets:
0.6 0.6 0.4 0.8 .
The two synthetic datasets are shown in Figure 3. The two synthetic datasets are appropriate for these types of simulations because they can represent both easy and difficult clustering situations. This allows us to evaluate the effects of our algorithm.

5.1.2. Simulations and Results

To estimate the extent to which the P S A can improve clustering capabilities, we compared P S E M with the original E M ( O E M ) algorithm. Both P S E M and O E M use E M to fit a GMM, and have a time complexity of O ( N 3 ) , where N is the number of data points. Hence, we cannot use time complexity to compare P S E M and E M . Computational efficiency can also be measured from the number of iterations. The E M algorithm contains two steps: the E-step and the M-step. These two steps are iteratively executed to fit a GMM, and are the core calculations of this algorithm. Hence, we compared the number of iterations in P S E M (i.e., how many E-steps and M-steps were executed) with the number of iterations in O E M . Note that the O E M algorithm does not use P S A , so its calculations start at randomly selected initial starting points.
P S E M and O E M were executed 200 times for the two different datasets. Figure 3 shows 200 peak searching results for P S A . The dark crosses indicate the peaks identified by P S A . We can see that, in the isotropically distributed dataset, the identified peaks are very close to the true peaks. In the anisotropically distributed dataset, the identified peaks are also close to the true peaks. Figure 4 illustrates the number of iterations (y-axis) for each size of dataset (x-axis). In the peak searching step, three different acquisition functions are used ( U C B , E I , and P I ), and their calculation efficiencies are compared. According to the results shown in Figure 4, there were 3.06 to 6.3 times fewer iterations for P S E M than O E M . In other words, the P S A improved the calculation efficiency of O E M by 73.9 % to 86.3 % . Moreover, we can see that there is no obvious difference between the three acquisition functions.
Because we wanted to fairly estimate the extent to which the proposed P S A improves clustering capabilities, we compared the P S A to k - m e a n s ++ in another simulation. k - m e a n s ++ uses a special method to calculate its initial points, and its clustering method increases the speed of convergence. Note that both P S k - m e a n s and k - m e a n s ++ are based on k - m e a n s , which has a time complexity O ( N 2 T ) , where N is the number of data points and T is the number of iterations. Similarly, we cannot use time complexity to compare calculation efficiencies. However, we can compare the number of iterations required for P S k - m e a n s to that required for k - m e a n s ++. Both of these algorithms were executed 200 times with the two different datasets, and the results are shown in Figure 5.
The simulation results are shown in Figure 5. The average number of iterations for P S k - m e a n s is reduced by 1.04 to 1.99 times compared with the number of iterations for k - m e a n s ++. In other words, the P S A improved the calculation efficiency of O E M by 51 % to 67 % . Additionally, there was no obvious difference between the three acquisition functions.

5.1.3. Performance Estimation of Clustering

A c c u r a c y , p r e c i s i o n , and r e c a l l are three commonly used measurements for estimating machine learning algorithm performance. Therefore, we adopt these measurements to quantify the performances of our proposed algorithm. In simulations, a dataset containing two clusters is generated by GMM. To explain these measurements, we assume that the two clusters are cluster A and cluster B. Data points belonging to cluster A are considered to be positive instances, while those that belong to cluster B are considered to be negative instances. If a data point from cluster A is correctly clustered into cluster A, it is a true positive (TP) result. Otherwise, it is a false positive result (FP). Similarly, if a data point from cluster B is correctly clustered into cluster B, that is a true negative (TN); otherwise, it is a false negative (FN). Overall a c c u r a c y can be calculated as follows:
a c c u r a c y = T P + T N T P + F P + T N + F N .
R e c a l l is equal to the ratio of TP to the total number of positive instances. It is based on the total positive instances, and shows how many positive instances can be detected by the algorithm. It is calculated as
r e c a l l = T P T P + F N .
From a prediction standpoint, P r e c i s i o n indicates how many TPs occur in the detected positive instances. It presents the proportion of TP to the total number of data points that are detected as positive, which is equal to TP + FP. P r e c i s i o n is calculated as
p r e c i s i o n = T P T P + F P .
We estimated the a c c u r a c y , p r e c i s i o n , and r e c a l l of the P S k - m e a n s and P S E M clustering algorithms, and compared the values with those for k - m e a n s , k - m e a n s ++, and O E M . We repeated this estimation 200 times for each dataset; the average accuracy of each algorithm is shown in Figure 6 and Figure 7. The isotropic datasets shown in Figure 3 are difficult to cluster because the two clusters partially overlap and their centers are very close together. We can see from the simulation results shown in Figure 6 that the estimations of k - m e a n s , k - m e a n s ++, and O E M are similar. However, P S k - m e a n s and P S E M show a great improvement over their original algorithms. The accuracy of P S k - m e a n s is 1.69 times higher than that of k - m e a n s ++, while that of P S E M is 1.71 times higher than that of O E M . The recall of P S k - m e a n s is 1.66 times higher than that of k - m e a n s ++, and the recall of P S E M is 1.83 times higher than that of O E M . Moreover, the precision of P S k - m e a n s is 1.64 times higher than that of k - m e a n s ++’s. The precision of P S E M is 1.84 times higher than that of O E M .
The results for the anisotropic datasets are shown in Figure 7. Because the anisotropic datasets are elliptical, as shown in Figure 3, and the two datasets are very close together, the datasets are very difficult to cluster. As a result, k - m e a n s and k - m e a n s ++ exhibit low estimation performance, and P S k - m e a n s yields little improvement. However, the accuracy of P S E M was 1.48 times higher than that of O E M , and its recall and precision were 1.44 and 1.48 times higher, respectively, than they were for O E M . Accordingly, we can see that the P S A can improve clustering accuracy.

5.2. Simulation on a Real Dataset from Intel Berkeley Research Laboratory

We used a real sensor dataset from the Intel Berkeley Research Laboratory [32] to assess outlier detection performance. In the simulation, we only considered two features for each data point: temperature and humidity. Each sensor node contained 5000 data points, which are shown in Figure 8.
Because the original dataset did not provide any outlier information or labels, we manually cleaned the data by removing values that fell outside a normal data range. All of the remaining data points were considered to be normal. Table 1 lists the normal data ranges.
After completing this step, a uniform distribution was used to generate artificial outliers. Temperature outliers were generated within a range of (27–30) ° C, and humidity outliers were generated within a range of (42–46)%. Thus, some outliers can fall inside the normal range with the same probability. Outliers were then inserted into the normal dataset. We produced four different cases, in which the outliers accounted for 5 % , 15 % , 20 % , and 25 % of the total normal data points.

5.2.1. Setting of WSNs

P S E M and O E M were run for a real dataset from the Intel Berkeley Research Laboratory. The deployment of the WSNs is shown in Figure 9. There were 54 sensor nodes, each of which had a M i c a 2 D o t sensor for collecting humidity, temperature, light, and voltage values. Temperatures were provided in degrees Celsius. Humidity was provided as temperature-corrected relative humidity, and ranged from 0 100 % . Light was expressed in Lux (1 Lux corresponds to moonlight, 400 Lux to a bright office, and 100 , 000 Lux to full sunlight), and voltage was expressed in volts, ranging from 2 3 . The batteries were lithium ion cells, which maintain a fairly constant voltage over their lifetime; note that variations in voltage are highly correlated with temperature. We selected data from 10 sensor nodes (nodes 1 to 10) to test our method, and used only humidity and temperature values.
In this simulation, we assumed that the WSN was hierarchical and consisted of classes. (“Cluster” is used in the WSNs to describe a group of sensor nodes. However, “cluster” can also refer to a group of similar data points in data mining. In this paper, we use “class” instead of cluster to describe a group of sensor nodes). Each class contained one class head (CH) and other member sensor nodes (MSNs). The MSNs sent the data points collected over a certain time period to the CH, which used the proposed method to monitor whether the dataset collected from its members contained outliers. The configuration of the WSNs is shown in Figure 10.

5.2.2. Results

Using the real dataset, we tested the proposed P S E M and compared it with the O E M . The CH executed the P S E M or O E M to detect outliers, and sent outlier reports to the base station. We generated four different datasets, containing 5 % , 15 % , 20 % , and 25 % outliers. (Figure 11 and Figure 12).
It was relatively easy to detect outliers in the test dataset containing only 5 % outliers because the proportion of outliers was so low. Thus, the accuracy of our method approached 100 % for 5 % outliers. In contrast, the accuracy of the O E M was only approximately 85 % . In the other datasets, more outliers fell within the normal dataset. In such cases, it was difficult to detect the outliers; the accuracies of both methods decreased as the proportion of outliers increased. However, P S E M remained more accurate than O E M . In the worst case, with 25 % outliers in the test dataset, its accuracy of P S E M was approximately 80 % , while the accuracy of O E M was only approximately 60 % . That is, P S E M was about 1.09 to 1.29 times more accurate than O E M . Moreover, Figure 12 shows the number of iterations was 1.52 to 1.88 times lower for P S E M , meaning that P S E M improved the calculation efficiency of O E M by 60 % to 65.2 % . Because accuracy and iteration numbers are very important metrics for assessing the clustering algorithm efficiency, this simulation result demonstrated the practical significance of P E S M , and, therefore, of the P S A .

6. Discussion

In this section, we describe other important aspects of the WSNs, such as WSN power consumptions and lifetime. We also discuss the advantages and disadvantages of the proposed method.
Because most sensor nodes in the WSNs are powered by batteries, sensor node power consumptions, WSN lifetime, and energy efficiency are also important problems affecting the quality of a WSN. Mostafaei et al. [33] proposed an algorithm PCLA to schedule sensors into active or sleep states, utilizing learning automata to extend network lifetime. Our previous work attempted to extend battery life by reducing peak power consumption. We scheduled sensor execution times [34], and used optimized wireless communication routes to reduce energy consumption, with the goal of prolonging network lifetimes [35,36]. If the proposed P S A can be applied in such approaches to analyze data using clustering methods, then energy consumption can be further reduced. Because the P S A can reduce clustering iterations, the required computational power decreases, leading to energy savings.
The proposed algorithm has advantages and disadvantages. In conventional clustering methods such as E M and k - m e a n s , cluster-forming procedures are started at random data points. There are two disadvantages associated with this. First, correct clusters may not be able to form from random starting points. Second, because random staring points may not occur near cluster centers, massive iterations may be needed to update random points to approach the cluster centers. However, because the P S A can identify the peak points near cluster centers, it is a better approach for forming clusters than an algorithm starting from a random point. Therefore, clustering algorithms using the P S A can form clusters more accurately. Moreover, using peak points as the starting points to form clusters can significantly reduce clustering iterations because peak points are the desired points.
There are some disadvantages associated with the P S A . The P S A use BO and are, therefore, affected by the problems associated with BO. A particular issue is that a priori design is critical to efficient BO. As mentioned in Section 3, BO uses GPs to build Gaussian models with Gaussian distributions, making the resulting datasets transcendental. If a dataset does not have a Gaussian distribution, the P S A may be less efficient. Another weak point of the P S A is that it is centralized. It is not suited for highly distributed WSNs where data analyses are conducted at each sensor node.

7. Conclusions

In this paper, we proposed a new P S A for improving the performance of clustering algorithms (i.e., for improving accuracy and reducing clustering iterations). BO is used to search for the peaks of a collected dataset in the P S A . To investigate the efficiency of the P S A , we used the P S A to modify E M and k - m e a n s algorithms. The new algorithms were named P S E M and P S k - m e a n s , respectively.
Using simulations, we investigated the performance of P S E M and P S k - m e a n s relative to that of O E M and k - m e a n s ++. We conducted simulations using both synthetic datasets and a real dataset. For synthetic datasets, P S E M and P S k - m e a n s reduced iterations by approximately 6.3 and 1.99 times, respectively, at a maximum. Moreover, they improved clustering accuracy by 1.71 times and 1.69 times, respectively, at a maximum. On a real dataset for outliers’ detection purpose, P S E M reduced iterations about 1.88 times, and improved clustering accuracy by 1.29 times at a maximum. These results show that our proposed algorithm significantly improves performance. We obtained the same conclusions by illustrating the recall and precision improvements for P S E M and P S k - m e a n s .
In the future, we will improve this method so that it can be used with high-dimensional data, such as images collected by a camera. Moreover, we would like to deploy the peak searching algorithm with sensor nodes, in order to allow CHs to obtain peak searching results from their neighbors; this will reduce the calculation time required for the peak search. Thus, clustering can be implemented in the sensor node and communication costs can be reduced.

Acknowledgments

This work was supported by the Japan Society for the Promotion of Science (JSPS KAKENHI Grant Numbers 16H02800, 17K00105, and 17H00762).

Author Contributions

Tianyu Zhang, Qian Zhao, and Kilho Shin conceived and designed the algorithm and experiments. Tianyu Zhang performed the experiments, analyzed the data, and was the primary author of this paper. Yukikazu Nakamoto gave advice on this work and helped revise the paper. Additionally, Qian Zhao and Kilho Shin helped revise the paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Sung, W.T. Multi-sensors data fusion system for wireless sensors networks of factory monitoring via BPN technology. Expert Syst. Appl. 2010, 37, 2124–2131. [Google Scholar] [CrossRef]
  2. Hackmann, G.; Guo, W.; Yan, G.; Sun, Z.; Lu, C.; Dyke, S. Cyber-physical codesign of distributed structural health monitoring with wireless sensor networks. IEEE Trans. Parallel Distrib. Syst. 2014, 25, 63–72. [Google Scholar] [CrossRef]
  3. Oliveira, L.M.; Rodrigues, J.J. Wireless Sensor Networks: A Survey on Environmental Monitoring. J. Clin. Microbiol. 2011, 6, 143–151. [Google Scholar] [CrossRef]
  4. Wu, C.I.; Kung, H.Y.; Chen, C.H.; Kuo, L.C. An intelligent slope disaster prediction and monitoring system based on WSN and ANP. Expert Syst. Appl. 2014, 41, 4554–4562. [Google Scholar] [CrossRef]
  5. Dempster, A.P.; Laird, N.M.; Rubin, D.B. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B 1977, 39, 1–38. [Google Scholar]
  6. Huang, Z. Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Min. Knowl. Discov. 1998, 2, 283–304. [Google Scholar] [CrossRef]
  7. Rasmussen, C.E.; Williams, C.K. Gaussian Processes for Machine Learning; MIT Press: Cambridge, MA, USA, 2006; Volume 1. [Google Scholar]
  8. Papaioannou, I.; Papadimitriou, C.; Straub, D. Sequential importance sampling for structural reliability analysis. Struct. Saf. 2016, 62, 66–75. [Google Scholar] [CrossRef]
  9. Behmanesh, I.; Moaveni, B.; Lombaert, G.; Papadimitriou, C. Hierarchical Bayesian model updating for structural identification. Mech. Syst. Signal Proc. 2015, 64, 360–376. [Google Scholar] [CrossRef]
  10. Azam, S.E.; Bagherinia, M.; Mariani, S. Stochastic system identification via particle and sigma-point Kalman filtering. Sci. Iran. 2012, 19, 982–991. [Google Scholar] [CrossRef]
  11. Azam, S.E.; Mariani, S. Dual estimation of partially observed nonlinear structural systems: A particle filter approach. Mech. Res. Commun. 2012, 46, 54–61. [Google Scholar] [CrossRef]
  12. Arthur, D.; Vassilvitskii, S. k-means++: The advantages of careful seeding. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, Society for Industrial and Applied Mathematics, Orleans, LA, USA, 7–9 January 2007; pp. 1027–1035. [Google Scholar]
  13. Wu, W.; Cheng, X.; Ding, M.; Xing, K.; Liu, F.; Deng, P. Localized outlying and boundary data detection in sensor networks. IEEE Trans. Knowl. Data Eng. 2007, 19, 1145–1157. [Google Scholar] [CrossRef]
  14. Breunig, M.M.; Kriegel, H.P.; Ng, R.T.; Sander, J. LOF: Identifying density-based local outliers. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, TX, USA, 15–18 May 2000; Volume 29, pp. 93–104. [Google Scholar]
  15. Sheela, B.V.; Dasarathy, B.V. OPAL: A new algorithm for optimal partitioning and learning in non parametric unsupervised environments. Int. J. Parallel Program. 1979, 8, 239–253. [Google Scholar] [CrossRef]
  16. Eskin, E. Anomaly detection over noisy data using learned probability distributions. In Proceedings of the International Conference on Machine Learning, Citeseer, Stanford, CA, USA, 29 June–2 July 2000. [Google Scholar]
  17. Eskinand, E.; Stolfo, S. Modeling system call for intrusion detection using dynamic window sizes. In Proceedings of the DARPA Information Survivability Conference and Exposition, Anaheim, CA, USA, 12–14 June 2001. [Google Scholar]
  18. Dereszynski, E.W.; Dietterich, T.G. Spatiotemporal models for data-anomaly detection in dynamic environmental monitoring campaigns. ACM Trans. Sens. Netw. (TOSN) 2011, 8, 3. [Google Scholar] [CrossRef]
  19. Bahrepour, M.; van der Zwaag, B.J.; Meratnia, N.; Havinga, P. Fire data analysis and feature reduction using computational intelligence methods. In Advances in Intelligent Decision Technologies; Springer: Berlin/Heidelberg, Germany, 2010; pp. 289–298. [Google Scholar]
  20. Phua, C.; Lee, V.; Smith, K.; Gayler, R. A comprehensive survey of data mining-based fraud detection research. arXiv. 2010. Available online: https://arxiv.org/ftp/arxiv/papers/1009/1009.6119.pdf (accessed on 29 July 2017).
  21. Aqeel-ur-Rehman; Abbasi, A.Z.; Islam, N.; Shaikh, Z.A. A review of wireless sensors and networks’ applications in agriculture. Comput. Stand. Interfaces 2014, 36, 263–270. [Google Scholar] [CrossRef]
  22. Misra, P.; Kanhere, S.; Ostry, D. Safety assurance and rescue communication systems in high-stress environments: A mining case study. IEEE Commun. Mag. 2010, 48, 11206229. [Google Scholar] [CrossRef]
  23. García-Hernández, C.F.; Ibarguengoytia-Gonzalez, P.H.; García-Hernández, J.; Pérez-Díaz, J.A. Wireless sensor networks and applications: A survey. Int. J. Comput. Sci. Netw. Secur. 2007, 7, 264–273. [Google Scholar]
  24. John, G.H. Robust Decision Trees: Removing Outliers from Databases. In Proceedings of the First International Conference on Knowledge Discovery and Data Mining (KDD’95), Montreal, QC, Canada, 20–21 August 1995; pp. 174–179. [Google Scholar]
  25. Han, J.; Pei, J.; Kamber, M. Data Mining: Concepts and Techniques; Elsevier: Amsterdam, The Netherlands, 2011. [Google Scholar]
  26. Ash, J.N.; Moses, R.L. Outlier compensation in sensor network self-localization via the EM algorithm. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Philadelphia, PA, USA, 23–23 March 2005; Volume 4, pp. iv–749. [Google Scholar]
  27. Yin, F.; Zoubir, A.M.; Fritsche, C.; Gustafsson, F. Robust cooperative sensor network localization via the EM criterion in LOS/NLOS environments. In Proceedings of the 2013 IEEE 14th Workshop on Signal Processing Advances in Wireless Communications (SPAWC), Darmstadt, Germany, 16–19 June 2013; pp. 505–509. [Google Scholar]
  28. Münz, G.; Li, S.; Carle, G. Traffic anomaly detection using k-means clustering. In Proceedings of the GI/ITG Workshop MMBnet, Hamburg, Germany, September 2007. [Google Scholar]
  29. Devi, T.; Saravanan, N. Development of a data clustering algorithm for predicting heart. Int. J. Comput. Appl. 2012, 48. [Google Scholar] [CrossRef]
  30. Brochu, E.; Cora, V.M.; De Freitas, N. A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv. 2010. Available online: https://arxiv.org/pdf/1012.2599.pdf (accessed on 30 June 2017).
  31. Bishop, C.M. Pattern Recognition and Machine Learning; Springer: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
  32. Intel Lab Data. Available online: http://db.csail.mit.edu/labdata/labdata.html (accessed on 30 August 2017).
  33. Mostafaei, H.; Montieri, A.; Persico, V.; Pescapé, A. A sleep scheduling approach based on learning automata for WSN partial coverage. J. Netw. Comput. Appl. 2017, 80, 67–78. [Google Scholar] [CrossRef]
  34. Zhao, Q.; Nakamoto, Y.; Yamada, S.; Yamamura, K.; Iwata, M.; Kai, M. Sensor Scheduling Algorithms for Extending Battery Life in a Sensor Node. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. 2013, E96-A, 1236–1244. [Google Scholar] [CrossRef]
  35. Zhao, Q.; Nakamoto, Y. Algorithms for Reducing Communication Energy and Avoiding Energy Holes to Extend Lifetime of WSNs. IEICE Trans. Inf. Syst. 2014, E97-D, 2995–3006. [Google Scholar] [CrossRef]
  36. Zhao, Q.; Nakamoto, Y. Topology Management for Reducing Energy Consumption and Tolerating Failures in Wireless Sensor Networks. Int. J. Netw. Comput. 2016, 6, 107–123. [Google Scholar] [CrossRef]
Figure 1. A volume in three-dimensional space.
Figure 1. A volume in three-dimensional space.
Jsan 07 00002 g001
Figure 2. Peak searching.
Figure 2. Peak searching.
Jsan 07 00002 g002
Figure 3. Synthetic dataset.
Figure 3. Synthetic dataset.
Jsan 07 00002 g003
Figure 4. Comparison of iterations: OEM.
Figure 4. Comparison of iterations: OEM.
Jsan 07 00002 g004
Figure 5. Comparison of iterations: k-means++.
Figure 5. Comparison of iterations: k-means++.
Jsan 07 00002 g005
Figure 6. Measurements for isotropic dataset.
Figure 6. Measurements for isotropic dataset.
Jsan 07 00002 g006
Figure 7. Measurements for anisotropic dataset.
Figure 7. Measurements for anisotropic dataset.
Jsan 07 00002 g007
Figure 8. Dataset from Intel Berkeley Research Laboratory.
Figure 8. Dataset from Intel Berkeley Research Laboratory.
Jsan 07 00002 g008
Figure 9. Floor plan of Intel Berkeley Research Laboratory.
Figure 9. Floor plan of Intel Berkeley Research Laboratory.
Jsan 07 00002 g009
Figure 10. WSN configuration.
Figure 10. WSN configuration.
Jsan 07 00002 g010
Figure 11. Accuracy of real dataset.
Figure 11. Accuracy of real dataset.
Jsan 07 00002 g011
Figure 12. No. of iterations for dataset.
Figure 12. No. of iterations for dataset.
Jsan 07 00002 g012
Table 1. Normal data ranges.
Table 1. Normal data ranges.
RangeAverage
Temperature ( ° C) 21.32 28.14 23.14
Humidity (%) 26.39 44.02 37.69

Share and Cite

MDPI and ACS Style

Zhang, T.; Zhao, Q.; Shin, K.; Nakamoto, Y. Bayesian-Optimization-Based Peak Searching Algorithm for Clustering in Wireless Sensor Networks. J. Sens. Actuator Netw. 2018, 7, 2. https://doi.org/10.3390/jsan7010002

AMA Style

Zhang T, Zhao Q, Shin K, Nakamoto Y. Bayesian-Optimization-Based Peak Searching Algorithm for Clustering in Wireless Sensor Networks. Journal of Sensor and Actuator Networks. 2018; 7(1):2. https://doi.org/10.3390/jsan7010002

Chicago/Turabian Style

Zhang, Tianyu, Qian Zhao, Kilho Shin, and Yukikazu Nakamoto. 2018. "Bayesian-Optimization-Based Peak Searching Algorithm for Clustering in Wireless Sensor Networks" Journal of Sensor and Actuator Networks 7, no. 1: 2. https://doi.org/10.3390/jsan7010002

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop