1. Introduction
Wireless sensor networks (WSNs) are receiving much attention from both research and industrial fields. The main attractive aspects are their low cost, and effective and easy-to-implement sensors. They are re-configurable to many types of applications to solve real-world problems. However, these small sensors are resource-constrained and battery-limited. In addition, they can be deployed in open and remote environments on large scales that are possibly exposed to several risks. Consequently, they are vulnerable to numerous security attacks. Counteracting these attacks and implementing security techniques can involve a great energy cost to these networks. It can damage the balance of energy consumption in the network, decreasing the nodes’ connectivity and the network survival time.
One way to preserve some battery power during communications involves the choice of network architecture and of the protocol used to route data from a sensor to the
BS. A number of methods designed to handle security issues in
WSNs exploit network hierarchization and cluster-based routing [
1]. Clustering a sensor network typically leads to possibilities for scaling, better control of nodes and energy efficiency gains provided by partitioning [
2,
3]. Other ways to preserve battery power are sleep scheduling and energy harvesting. The former aims at minimizing the number of sensors to activate to cover a desired portion of the region of interest preserving the connectivity among sensors [
4]. The latter aims to resolve the issue that the nodes are often unreachable after deployment and introduces the concept of renewable energy that can be harvested from the surrounding environment [
5]. As mentioned before,
WSNs are vulnerable to several attacks, which can be classified into two types:inside attacks, performed by authorized nodes and outside attacks, performed by unauthorized nodes.
Previous studies have shown that inside attacks are far more difficult to control than outside ones [
6,
7]. Selfishness and pollution attacks are two forms of critical inside attacks, where nodes can easily launch a type of Denial of Service (
DoS) in the first, or pollute data in transit to interrupt the normal functioning of the network, in the second. More specifically, in the former attack, selfish nodes refuse to relay packets and use network resources for their own benefit. This operation is one variety of a
DoS attack which can be performed in different ways [
8] and has multiple forms [
9]. In fact, a
DoS attack can also be achieved by congesting the network or draining the energy from its components [
10]. This attack is considered one of the most frequent attacks in
WSNs [
11,
12]. In the latter attack, pollution attacker nodes modify native data packets or inject fake ones into the network. A pollution attack is potentially more damaging when applied in protocols for
WSNs based on network coding (
NC) since it may have devastating impact on the data routing [
13].
Figure 1 illustrates the aforementioned considered attacks.
In a clustered network, if the misbehaving nodes that invoke a selfishness or a pollution attack are the
CH nodes, then the network will be more disordered and the consequences more serious. The
CHs have a critical role in forwarding the cluster data to the
BS [
14].
DoS and pollution attacks are challenging issues that aim to disable the service that a
WSN is supposed to provide, and create routing failure.
This paper proposes a secure protocol for
WSNs that defends against the misbehavior of selfish and pollution attacker nodes in the network. It extends our previous work [
15], which ensured confidentiality of exchanges in clustered
WSNs using an optimized (
NC) technique. This extension provides a solution for preventing an attack on network availability as well as integrity. It also proposes attack detection and a punishment mechanism for malicious nodes causing the
DoS attack or the pollution attack. The solution takes advantage of the matrix format of keys specific to each sensor node used for confidential communication in [
15] and implements a new scheme of
NC by adding data redundancy. It converts a file/message into a set of
p pieces in such a way that the file/message can be reconstructed from any predefined subsets of
t out of
p pieces. Consequently, our solution ensures data availability and integrity with considering the confidentiality requirement already existing.
Our proposition is a secure protocol against selfish and pollution attacker misbehavior in clustered WSNs, named SSP. It aims to shield against a coalition of misbehaving CH nodes that initiate a DoS attack by dropping received packets or a pollution attack by altering them. The main contributions of this paper are as follows:
Preventing a DoS attack by a group of CH nodes and ensuring a reliable transfer of data between source and BS.
Ensuring data integrity and error correction despite the existence of a pollution attack performed by a group of CH nodes.
Locating the misbehaving nodes among CHs and implementing a punishment mechanism against them to enhance the network availability and integrity.
Providing simulation results that prove the effectiveness of our protocol compared to a defenseless protocol, regarding the percentage of correct messages received and also considering the case when the number of misbehaving nodes is varying.
Proving the scalability of the protocol.
The present paper is structured as follows.
Section 2 gives feedback on the literature dealing with existing secure protocols for
DoS and pollution attacks prevention/detection.
Section 3 presents the assumptions about the system and the threat model used in the paper. The design steps for the protocol and a theoretical discussion are given in
Section 4. The simulation set-up and the results are presented in
Section 5.
Section 6 summarizes the paper and provides conclusions.
2. Related Work
In this section, we present feedback on the literature on secure protocols dealing with availability and integrity requirements in WSNs.
Many studies have been conducted to propose solutions that provide data availability to routing protocols, whether by preventing or detecting DoS attacks in WSNs.
In Ref. [
16], the authors proposed a protection against
DoS attacks in
WSNs. Their solution is based upon two phases. The first phase partitions the network into clusters via the Hybrid Energy-Efficient Distributed clustering (
HEED) protocol. The second sets up a Key Distribution Server (
KDS), which supplies each node with session keys and unique
IDs. Then, it executes a mutual authentication process between server and
CHs and then between
CHs and cluster members using a hash function. When a
CH detects a malicious node after an unsuccessful authentication process, it requests the
KDS to delete its secret key and requests all the other
CHs to block it from inter-cluster communication. Consequently, this node becomes keyless and all the services related to it are arrested. After a certain period of time, the
KDS calculates new session keys for each
CH to communicate with the server and with the cluster members. Although the proposed mechanism can prolong the network lifetime and reduce the overhead using HEED clustering protocol, the cryptography-based mechanism used for defending
DoS attacks is computationally expensive and also energy consuming.
Mansouri et al. designed a clustering-based approach to address
DoS attacks in
WSNs [
17]. Their approach is conducted in two phases. The first phase elects a control node (
Cnode) with a recursive low-energy adaptive clustering hierarchy (
LEACH) processing algorithm, for each cluster. The second phase permits detecting and blocking of compromised nodes by the
Cnode. When a node sends a number of messages that are greater than a threshold, the
Cnode identifies that node as a compromised node. All messages sent by the compromised node will be rejected and ignored by the neighboring nodes. The proposed protocol tends to provide significant results in term of time detection and throughput. However, it only deals with a specific type of
DoS attack, which is excessive data transmission to drain the node’s energy. In addition, it does not consider the case when a
CH node executes the attack.
In Ref. [
18], the authors adopted recursive clustering based on the
LEACH algorithm. However, they enhanced it by using a novel algorithm named Fast and Flexible Unsupervised Clustering Algorithm (
FFUCA). In this approach, the recursive clustering is used to identify the
CH node and control nodes (
Cnode) in each cluster, taking into account the node’s location and energy consumed. The authors compared the relevance of these two algorithms (
FFUCA and
LEACH).
FFUCA generates an optimal solution for minimizing distances between
Cnode and sensor nodes when deploying the network compared to
LEACH. It also achieves better
DoS detection for false positive and false negative rates. However, this approach still does not deal with the case when the
CHs are compromised. Sujit et al. proposed a new approach for detection of selfish node’s behavior in
WSN [
19]. They used average and maximum values of re-transmissions in several nodes aimed to detect selfish nodes in the network. First, the protocol sets the shortest routes using Dijkstra’s algorithm. Then, it decides on a node type, whether it is a partially selfish node, a fully selfish node or a non-selfish node, using a threshold value and the calculated maximum value of re-transmissions. This system detects but does not prevent selfish behavior when routing data in the network.
Virmani et al. [
20] introduced an exponential trust-based mechanism to identify malicious nodes in a clustered network. Heads of clusters are selected based on their higher energy level. When a source node in a cluster sends a packet to a particular node, its streak counter stores the count of consecutive packets dropped. The mechanism calculates a trust factor (
TF) formula, depending on the counter value, which falls exponentially with the increase in the number of packets dropped by the particular node. When
TF goes below a certain threshold value, the node is stated as an adversary. In this case, the cluster head sends a request to the source node to re-transmit the packet. The proposition tends to detect the black hole attack, a type of
DoS attack, where packets are consecutively dropped. However, it does not ensure protection from the attack and uses re-transmission of dropped packets, a process which has an extra energy cost.
Kalkha et al. [
21] proposed an approach for preventing black hole attack in
WSN using the Hidden Markov Model (HMM) to model the succession of choices of shortest path made by a source node to reach a destination node. Their approach uses a Viterbi algorithm to determine the path with the greatest probability of being malicious. Next, it identifies the likeliest fake nodes in the selected malicious path and sets a new routing algorithm that avoids these malicious nodes. The approach helps to avoid the selection of malicious path and node for routing data to the destination. However, it is dependent on a probabilistic, not a deterministic method. Moreover, it uses flat routing where each source node performs a multiple paths selection and re-selection after the maliciousness analysis, which will exhaust its resources quickly.
The network coding (
NC) methodology has been used in many research papers to ensure security against various attacks [
22]. Specifically, this methodology was introduced as an alternative to conventional networking, in which intermediate nodes merely forwarded incoming packets without alteration. In fact, it performs computations on received data prior to forwarding the data to the next hop. The
NC technique exists in a range of types that are further clarified and detailed in [
23]. We have proposed, in a previous study, a secure network coding-enabled approach for a confidential cluster-based routing in wireless sensor networks [
15], called
SNCR. Without the need to use expensive traditional cryptographic-based methods, it offers an optimized version of
NC methodology to overcome eavesdropping attacks on transmitted data. It exploits pre-loaded hidden encoding vectors and transmits only a single digit instead of transferring all the coding coefficients with the coded vector. Therefore, it minimizes the overheads compared to conventional network coding systems.
SNCR has been proved to guarantee confidential data transmission whether the adversary attack is internal or external to the network. However, it does data coding at two hierarchical levels of the network, source nodes and
CHs, which reduce the overall power of the network.
The specific communication mode of
NC, in which intermediate nodes are allowed to deliberately change the received packets, creates opportunities for malicious nodes to disrupt correct data routing. When a packet (possibly encoded) is polluted, the decoding of the set of encoded packets associated with it will not generate the correct original message data at the destination node. Thus, a pollution attack in an environment where network coding is used is more likely, and preventing it becomes more important [
24].
In this context, several solutions have been proposed to thwart pollution attacks in
NC-enabled networks. Adeli et al. proposed in [
25] a secure linear network coding scheme built upon a cryptographic hash function. This latter is used to introduce different random noisy terms within the information symbols. However, their scheme imposed a restriction on the linear network code design. A new refined scheme was proposed by Kim and Young-Sik in [
26] to remove this restriction but this type of scheme still needs computation of the hash values of each packet, thereby increasing the transmission delay and the required operations’ complexity.
Yu et al. proposed an efficient homomorphic signature scheme based on the
RSA signature [
27]. In their scheme, the source signs its message using its private key. Intermediate nodes verify the received messages using the source’s public key. The authors used a novel homomorphic function where the encoded message’s signature is composed from those of the input messages (used to create the encoded message). Their scheme can accomplish source authentication and data verification. However, it increases transmission overhead, since the size of an
RSA signature is typically very large. The authors of [
28] proposed an identity-based digital signature scheme to detect pollution attacks in intra-session network coding. Their scheme does not involve a third-party query to certificate authority and does not have the key issue, and it does not take into consideration the energy consumption parameter in resource-constrained
WSNs.
Liu Xiang et al. [
29] presented a privacy-preserving signature scheme for linear network coding-based networks. They used a homomorphic signature and applied new signing and verification processes that enhance the computational efficiency. Their scheme optimizes the inherent security potential of random network coding and provides countermeasures against both eavesdropping attacks and pollution attacks. By encrypting a message before signing it, the intermediate nodes are able to capture and discard fake packets. Although the homomorphic signature
NC scheme can solve the problem of data integrity and confidentiality, the signature generation and verification processes remain high-energy expenses, reducing the transmission efficiency.
Another approach, proposed in [
30], used an identity-based linearly homomorphic signature scheme for
NC-enabled wireless sensor networks to ensure data integrity and authenticity. In their scheme, the authors applied a signature generation and verification that are both independent of the size of the data packet to reduce the computation cost. They used the unique serial numbers of source nodes as an identity-based public key that is free of certificate management, allowing the destination to track the source of data.
In the previous studies on solutions, whether for DoS attacks or pollution attacks, we noted several disadvantages. In the solutions tackling DoS attacks, some ignored the case where the CHs are the malicious nodes in the network, despite their major role in controlling data routing, which can affect network availability. Others focused only on solutions for detecting DoS attacks and not for preventing them. Others approaches used cryptographic mechanisms with high computational costs. Solutions attempting to resist pollution attacks, using hash functions or homomorphic signatures, are computationally expensive and may lead to traffic overhead.
Our solution tackles both
DoS and pollution attacks, along with eavesdropping attacks. In contrast to the previous work concerning network availability, the solution proposes a prevention as well as a detection scheme for misbehaving
CH nodes. It ensures original packets reach their destination even if a number of packets are dropped. In addition, it considers the case where the
CH nodes are malicious. Moreover, our solution takes advantage of the
NC technique used in [
15] to guarantee data confidentiality and optimizes it by achieving data coding only at source nodes. Hence, it minimizes overheads and energy depletion in the network. Second, in relation to the previous work attempting to oppose pollution attacks, the solution does not involve an extra cryptographic mechanism, which drains sensor node energy, to ensure data integrity; it requires only extra computations at the level of the
BS, which is supposed to be an unlimited-resource device.
4. Design and Implementation
This section provides the design of our new secure routing protocol, referred to as
SSP. As described in
Section 3.2, it applies to networks with hierarchical topology. Its main goal is to defend against
DoS and pollution attacks and to detect the malicious
CHs responsible for it through a modified
NC-based routing. The protocol consists of three phases: (1) initialization; (2)
DoS and pollution attack prevention; and (3) attack detection and punishment.
4.1. Initialization
The network is composed of randomly deployed sensor nodes and a BS. The solution we propose begins with clustering. The BS divides the network into N clusters, i.e., N groups of source nodes, headed by a CH. Obviously here, N must be greater than p since p is a sub-group of CH nodes. p is predefined by the user before deploying the network, taking into consideration the network size.
Next, the BS chooses, for each source node, a set of the nearest p CHs and sends their identities to it. As a result, the paths to be used for routing data elements from each source node are established and shared with the BS.
Data are transmitted from source nodes to CH nodes and then from CH nodes to the BS.
4.2. DoS and Pollution Attacks Prevention
Our solution main phase is the prevention against
DoS attacks and pollution attacks, ensuring data availability and integrity despite the existence of misbehavior in the network. It prevents a coalition of malicious
CHs executing a
DoS attack by dropping all the packets they receive or achieving a pollution attack by modifying the packets before their transmission to the
BS. It guarantees the reconstruction of original data messages at the
BS. For this purpose, our protocol uses the
NC-techniques as described in
Section 3.1. It produces
p parts from a particular data message of size
t, where
, so that
t of them suffice for reconstructing the original one at the final destination, the
BS. This phase is composed of: (i) encoding and parts creation at the source node and (ii) decoding and parts reconstruction at
BS.
4.2.1. Encoding and Parts Creation at Source Node
The encoding process is conducted by each source node in the network. Each node is pre-equipped with its specific unique matrix. of large size denotes the pre-loaded matrix, where are ’s integer coefficients, and , chosen so that each combination of l columns forms an invertible matrix of size .
When measuring events, a node
registers the measurements in its memory. After
t periods, it has collected
t parts of these measurements in a vector of size
t,
, forming a data message of size
t. Then,
randomly chooses a number
such that
and pinpoints the coefficient of
having the position
. Next, it extracts a sub-matrix
of size
starting at the indicated
position in the matrix
, where
and
.
Figure 4 shows the process of extraction of the sub-matrix
used to generate the coded packets.
Next,
calculates
p different parts from the
t parts of the vector
. These parts are the elements of the vector
of size
p composed as follows:
: More specifically,
The value , required to decode data at the BS, is attached to each data packet to be sent through a distinct CH.
After encoding the original data parts, transmits these parts along the p distinct CHs, chosen at the beginning of the protocol. More specifically, sends the packets (, , ), where , through the nearest p different CHs saved in its routing table, where is the node identity and is the random number chosen for conducting the network coding phase of . Next, each CH forwards the received packets from multiple source nodes to the BS.
Each node has p distinct paths to the BS through p different CHs. The scheme tolerates of them being compromised. In other words, the coalition of misbehaving CHs involved in the attack here is assumed to be of a size of maximum K with to ensure complete reception of all source nodes messages. In fact, the source node will have at least parts received in the BS, to be sure it can rebuild the original measures. Beyond the threshold K, the performance of the protocol in term of availability and integrity decreases.
4.2.2. Decoding and Parts Reconstruction at BS
The decoding process is conducted by the BS for each source node. When the BS receives all the packets from CH nodes, it collects the packets belonging to the same source node using its identity .
Since the adversary model guarantees that the BS receives at least parts for each node, the BS is able to reconstruct the original data packet corresponding to each source node. In fact, the BS receives the parts from different paths corresponding to different CHs, where ( represents the number of received packets for a source node). In this case, it triggers the decoding process as follows:
The
BS checks for the
values received from the different packets, memorize the value
existing at least
times and distinguishes it as the true one (
) that has not been modified by an adversary. This is true because we assumed that the protocol guarantees that at least
packets are correctly received for each source node
. Next, it checks the pre-loaded matrix
and extracts from it the sub-matrix
of size
at the position value
. Afterwards, to reconstruct the original measurements, the
BS uses each combination
c of
t data parts out of the
received parts to resolve the system of equations
. More specifically, it resolves a number value of
systems to extract a set of possible original messages
as follows, where
and
:
where
are
extracted combinations of size
t from the set
.
Next, the BS detects the most apparent message in the set of the constructed messages and recognizes it as the original one. The correct message will appear a number of times equal to whereas each of the wrong messages will only appear once. depicts the number of polluted packets received at the BS.
The process of resolving the system of equations above can be done with a simple elimination of Gauss or by inverting the matrix made up of the different t encoding vectors of to obtain a matrix of t columns corresponding to the different t CH nodes that sent the packets and which are stored in the routing table at the BS. If the matrix is chosen randomly and so is the matrix , there is no known method to rebuild the original data, even partially, from () parts.
4.3. Attack Detection and Punishment
Our solution then has an attack detection and punishment phase, which locates the misbehavior that affects the network availability or integrity. It identifies the MNs among the CHs executing the DoS or pollution attacks and applies a punishment mechanism to exclude them from the routing process. The BS is aware of all the nodes’ locations and the different groups of clusters in the network. In addition, the BS has a control counter (cc) for each CH, which decides on its maliciousness.
At the end of each routing round, the BS is supposed to receive the set of packets , , where through p different paths coming from the nodes . are the p CH identities saved in the routing table for each source node . The BS’s decision depends on the source nodes and the packets received for each one. It begins by scanning the malicious CHs invoking a DoS attack then continues by scanning the malicious CHs effecting a pollution attack. Once it has collected the list of packets received for a source node , the BS compares it to the list .
First, for DoS attack detection, it notes the set of packets missing and identifies the CH identities in charge of transmitting them. Then, it increments their counter by 1, as they are suspected of being malicious nodes.
Second, for pollution attack detection and after collecting the list of packets received for a source node , the BS proceeds via two steps:
Since it has already saved the true value in its memory when calculating the original routed message, the BS checks for received packets containing a value different from the value. It recognizes the CHs that have sent these packets as potentially malicious nodes and increments their counter by 1.
It checks the combinations of packets that generated the original message and notes the list of CH identities that have transmitted these packets. Then, it classifies the CHs with identities not belonging to this list as potentially being malicious and increments their by 1. The reason is that one polluted packet in a combination of t packets will generate a fake calculated message.
Lastly, if the cc of a specific CH node reaches the value r, where r is a threshold chosen by the user before deploying the network, the BS recognizes this CH as a selfish node. In other words, if the node with identity has effected a number of attacks, whether by refusing to send a data packet or by faking it, equal to r, the BS proceeds as follows:
It excludes the node with identity from the network and bans it from the routing process by deleting it from the network nodes list.
It invokes a network re-clustering using the updated list of existing nodes.
It resets the paths used for each source node by choosing new p CH identities for each one.
In this way, the protocol increases the security level of the network, not only by preventing DoS and pollution attacks, but also by conducting active steps to localize their source and punish the MNs responsible.
4.4. Discussion
We present, in this subsection, some main points for discussing our solution in regards to the method used for ensuring the security aspects of availability and integrity, and confidentiality.
The main algorithm, the NC technique that creates a data redundancy, is effective with respect to size; that is to say that rebuilding t measures only needs t parts whose total size is identical. It is, however, possible to use more parts to help protect the network against DoS attacks or pollution attacks. In other words, an attacker needs to compromise at least separate paths, i.e., in the worst case, CH nodes. If the attacker does not know the topology set up to route data in the network, it has no choice other than to randomly compromise the nodes. As a result, it will likely need to capture more than CH nodes.
This strong availability and integrity is obtained at the cost of extra data parts transmission for each source node, i.e., transmitting
p parts instead of
t parts where
. However, our approach does not require heavy operations: dispersal and reconstruction are computationally efficient [
33] and the systems of equations have to be solved only by the
BS.
It is not possible for a few compromised nodes to reconstruct the measures. In other words, there is no technique for rebuilding, even partially, the measurements from parts. This is the major advantage of dispersing the information in the network.
The solution, based on
NC, provides complete confidentiality, so that no information can be obtained by intercepting coded symbols or random values of
. In fact, a wiretapper cannot recover the original data packets of a source node
, whether it is an outsider node or a captured
CH inside the network. The main reason is that the corresponding matrix
belonging to a source node
is secret and not communicated to the destination when routing data packets. Moreover, the chosen values of
for each node
are changing and chosen randomly each round, so no other node but the
is able to reconstruct the original data. A detailed demonstration is available in our previous paper [
15].
Apart from security requirements, our solution minimizes the overheads compared to conventional network coding systems. First, instead of transferring all the coded vectors, i.e., the coefficients , where , along with the coded p symbols, a source node adds only a single digit to the coded vector to be sent. Second, the solution performs encoding only at the level of source nodes; there is no re-coding at the level of intermediate nodes, i.e., the CH nodes. This decreases the energy consumption and increases the network lifespan.