1. Introduction
Wireless sensor networks (WSNs) are composed of large-scale, self-organized sensor nodes, which are capable of sensing, data storage, and communication. WSNs have lots of applications, such as remote environment sensing, industrial automation, smart city, and military monitoring. In practical applications, lots of ordinary nodes are deployed in an unattended mode. These ordinary nodes perform data collection tasks individually and transmit the raw data to the sink node in multi-hop access. Since it is difficult to recharge or replace the limited power supply of ordinary nodes, developing energy efficient data gathering methods is becoming crucial.
A large number of data collection methods have been proposed to reduce the energy consumption with different levels of data reconstruction precision in the literature [
1,
2,
3]. These obtained data in WSNs possess spatial and temporal correlations, which are intrinsic characteristics of a physical environment. A previous article [
1] proposed a clustered aggregation (CAG) algorithm for data collection, which utilizes the spatial and temporal correlations of the sensed data. Pham et al. [
2] presented a divide and conquer approximating (DCA) algorithm to reduce power consumption. Since the sensed data require to be transmitted to the sink node in multihop communication, Rosana et al. [
3] proposed a novel algorithm to construct spanning trees for efficient data gathering in wireless sensor networks. Unfortunately, data gathering methods in traditional mode have limitations. Firstly, the clustering methods (or the spanning tree construction methods) reflect high computational cost, as well as the dynamic maintaining of clusters (or trees). Secondly, the energy consumption is not balanced since the nodes close to the sink consume more energy.
Compressive Sensing (CS) [
4,
5,
6,
7] theory has brought about a new approach for efficient data gathering in WSNs. Since the sensed data have temporal and spatial correlations, they can be sparsely represented in an appropriate transform basis. CS theory states that a small number of linear measurements can accurately reconstruct the sparse signals when the sensing matrix satisfies the restricated isometry property (RIP). Thus, the number of data transmissions for one measurement is largely reduced. Since the high energy-intensive reconstruction algorithm is implemented at the sink node, the computational cost between these ordinary nodes is quite low. Benefiting from the merits of CS, the energy consumption is balanced and reduced for data gathering problem in WSNs. Thus, many papers [
8,
9,
10,
11] have been published about efficient data gathering methods based on CS theory in recent years. Luo et al. [
8] first proposed to apply compressive sensing for data gathering in WSNs. The idea of the proposed compressive data gathering (CDG) is that the intermediate nodes transmit the weighted sums of father nodes and their own data. In [
9], a distributed compressive sampling method was presented. The method is quite efficient since in-network compression is employed, and each node individually determines the transmit scheme to minimize the total number of transmissions. Liu et al. [
10] presented the compressed data aggregation (CDA) method to reconstruct the original signals with high precision. Meannwhile, the energy consumption is reduced in comparison with the CDG method. In [
11], the authors proposed the compressive data collection (CDC) method to collect data in wireless sensor networks. The scheme reduces the necessary number of measurements, thus the network lifetime is prolonged.
However, the real application of CS in WSNs is difficult. Firstly, CS assumes the data are sparse or could be sparsely represented in a transform basis. Nevertheless, the appropriate sparse matrix basis is not always available. Secondly, the spatial correlation and the temporal correlation cannot be employed together since the sensed data are expressed in the vector form.
As a more efficient data gathering method, matrix completion (MC) [
12] considers recovering the incomplete data matrix by observing a small part of the matrix elements. Actually, MC is an extension of the CS theory. In CS, the signals are represented in the vector form, while MC formulates the signals in the matrix form. The sensor data are commonly denoted as matrix, such as the image signals and the video samples. Thus, these two-dimensional signals can be computed more efficiently in the matrix form, although they could be transformed into the form of a vector. In comparison with CS theory, MC do not require to seek a priori sparse basis, and the necessary sampling ratio could be even lower. Since the sensed data collected in WSNs have spatial and temporal correlation, they show low-rank properties. In [
13], the singular value thresholding (SVT) algorithm was proposed by approximating the low-rank matrix with a nuclear norm minimization method. To measure large-scale traffic datasets, Roughan et al. [
14] proposed the spatial and temporal matrix completion algorithm, which was called the sparsity regularized matrix factorization (SRMF). In [
14], intensive analysis of the massive traffic data resulted in the optimal choice of the spatial and temporal constraint matrices. SRMF can be extended to solve various matrix completion based problems, such as data interpolation, and missing matrix elements inference. To further take advantage of the low-rank feature and the short-term stability property of the sensed data, Cheng et al. [
15] proposed the STCDG method. The recovery accuracy is improved, and the power consumption is reduced by applying STCDG for data gathering in WSNs.
Actually, the sensor nodes are deployed in a finite area. Therefore, the features of the sensed data are coupled with network topology information. In our analysis, the sensed data are found to be sparse under the graph based transform (GBT) basis. The GBT basis is composed of the eigenvector of the Laplacian matrix when the whole network is represented as a graph. To the best of our knowledge, this is the first time the GBT sparsity has been applied to a matrix completion problem. In consideration of both the GBT sparsity and the low-rank feature of the sensed data, the GBTR-ADMM and the GBTR-A2DM2 algorithm are proposed. The time complexity of our proposed algorithms are also analyzed, which shows that they have a low complexity. Simulation results show our proposed algorithms outperform the state of art algorithms for data collection problems in respect to recovery accuracy, convergence rate, and energy consumption.
The main contributions of the paper are as follows:
- (1)
The features of sensor datasets are analyzed in consideration of their topology information, which reveals that the data matrix is sparse under the graph based transform.
- (2)
The graph based transform regularized (GBTR) Matrix Completion problem is formulated. To reconstruct the missing values efficiently, the GBTR by Alternating Direction Method of Multipliers (GBTR-ADMM) algorithm is proposed. Simulation results reveal that GBTR-ADMM outperforms the state of art algorithms in view of the recovery accuracy and the energy consumption.
- (3)
To accelerate the convergence of GBTR-ADMM, GBTR-A2DM2 algorithm is proposed, which benefits from a restart rule and the fusion of multiple constraints.
- (4)
The time complexity of our proposed algorithms is analyzed, which shows that the complexity is low.
The structure of the paper is concluded as follows: In
Section 2, the problem formulation about matrix completion is given.
Section 3 presents the features of the real datasets and the synthesized dataset. The proposed GBTR-ADMM and GBTR-A2DM2 algorithms are expatiated individually in
Section 4 and
Section 5.
Section 6 shows the time complexity of the proposed algorithms. In
Section 7, the performances of the proposed algorithms are studied. The conclusions and the future works are summarized in
Section 8.
2. Problem Formulation
In this section, we introduce the related issues in respect to matrix completion theory. The main notations of the paper are summarized in
Table 1.
Suppose there are N sensor nodes in the WSNs. Using denotes the sensor data, where represents a data vector collected by node in time slot . The sample interval is assumed to be equal. Thus, the data matrix can be used to represent the sensor data gathered by N sensor nodes in M time slots.
In order to reduce energy consumption in resource-constrained WSNs, only a small amount of sensor data is transmitted to the sink node. Let
denotes the indices of the corresponding observed data of
X. Similarly, let
denotes the indices of omitted value. Let
be the linear projection operator that keeps the entries in Ω invariant and adjusts the entries in
to zero, that is:
Suppose matrix is the observed data, which is the incomplete version of matrix X with entries those outside Ω zeros. That is .
Our goal is to reduce the amount of data transmission to the sink node, and to design relevant matrix completion algorithm to reconstruct the original data matrix
X as closely as possible. The observed ratio is defined as:
Next, the features of the datasets are studied in detail, which would be utilized in our designed algorithms.
6. Time Complexity Analysis
In this part, the computational complexity of the proposed algorithms is discussed. The calculation of an inverse matrix cost much, which has the time complexity of O(
n3) (
n is the dimension of an invertible matrix). Since matrix
is orthogonal, the expensive computation of matrix inversion in our implementation can be substituted by its transposition. Thus, the dominated computational cost of GBTR-ADMM and GBTR-A2DM2 is the execution of matrix SVD in each iteration. As pointed out in [
25], the time complexity of SVD operation is O(MN
2). In our implementation, the famous PROPACK [
26] is utilized to perform partial SVD for the proposed algorithms. Since the low-rank property of the objective matrix, it is inefficient to compute the full SVD. To obtain the dominated energy of the objective matrix, only those singular values exceeding than a certain threshold are necessary. The limitation of PROPACK is that it cannot automatically determine the necessary calculations, except for a predefined number. Thus we are supposed to estimate the number of singular values and assign the number to PROPACK in each iteration.
Suppose
is the number of positive singular values of
, and
is the number of singular value to be measured at
k-th iteration. Then, the following updated strategy [
27] is used,
where the initial estimated value of
is 10. Benefiting from the software package PROPACK, the time complexity for a M × N matrix with rank of
is O(
rMN). Hence, the total time complexity of our proposed algorithms is O(
rMN). Nevertheless, the state of art algorithms for matrix completion problem [
15,
17] demand a complexity of O(
r2MN) for each iteration.