IoT Network Model with Multimodal Node Distribution and Data-Collecting Mechanism Using Mobile Clustering Nodes

Vorobyova, Darya; Muthanna, Ammar; Paramonov, Alexander; Markelov, Oleg A.; Koucheryavy, Andrey; Ali, Gauhar; ElAffendi, Mohammed; Abd El-Latif, Ahmed A.

doi:10.3390/electronics12061410

Open AccessArticle

IoT Network Model with Multimodal Node Distribution and Data-Collecting Mechanism Using Mobile Clustering Nodes

by

Darya Vorobyova

¹,

Ammar Muthanna

^1,2

,

Alexander Paramonov

¹

,

Oleg A. Markelov

³

,

Andrey Koucheryavy

¹,

Gauhar Ali

⁴

,

Mohammed ElAffendi

⁴

and

Ahmed A. Abd El-Latif

^4,5,*

¹

Department of Communication Networks and Data Transmission, The Bonch-Bruevich Saint-Petersburg State University of Telecommunications, St. Petersburg 193232, Russia

²

Department of Applied Probability and Informatics, Peoples’ Friendship University of Russia (RUDN University), 6 Miklukho-Maklaya, Moscow 117198, Russia

³

Centre for Digital Telecommunication Technologies, Saint Petersburg Electrotechnical University “LETI”, 5F Professor Popov Street, St. Petersburg 197022, Russia

⁴

EIAS Data Science Lab, College of Computer and Information Sciences, Prince Sultan University, Riyadh 11586, Saudi Arabia

⁵

Department of Mathematics and Computer Science, Faculty of Science, Menoufia University, Shebin El-Koom 32511, Egypt

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(6), 1410; https://doi.org/10.3390/electronics12061410

Submission received: 8 December 2022 / Revised: 31 January 2023 / Accepted: 2 February 2023 / Published: 16 March 2023

Download

Browse Figures

Versions Notes

Abstract

:

In this paper, the novel study of an Internet of Things (IoT) network model with multimodal node distribution and a data-collecting mechanism using mobile clustering nodes is presented. The aim of this work is to introduce the problem of organizing the mobile cluster head IoT network with a heterogeneous distribution node in the service area with multimodal distribution nodes. A new method for clustering a heterogeneous network is proposed, which makes it possible to efficiently identify clusters that differ in terms of the density of nodes. This makes it possible to choose the speed of the mobile cluster head in accordance with the density in each cluster. The proposed method uses the Density-Based Spatial Clustering of Applications with Noise (DBSCAN) clustering algorithm. One of the benefits of our proposed model is the increase in the efficiency of using a mobile cluster head. The new solution can be used to organize data collection in the IoT.

Keywords:

IoT network; mobile cluster head; clustering algorithm; data collection; movement speed

1. Introduction

1.1. General Provisions

The penetration of the Internet of Things (IoT) into various areas of human activity has led to the emergence of various forms of implementation of wireless sensor networks (WSNs). These networks may differ in scale (geometric dimensions of the service area), number of nodes, traffic intensity, the location of nodes in space, and the nature of their movement. The choice of radio channel organization technology and control protocols for the logical structure of the network depends on these parameters [1,2,3].

Various WSN applications require networking with fixed or mobile nodes. The logical structure of the WSN is determined by the routing rules used. In the simplest case, when building data collection networks, there is a star-shaped structure, and all network nodes are located in the service area of the cluster head (CH). Often, it also acts as a gateway. In such a case, the service area is limited to the CH area, which, as a rule, is limited to hundreds of meters.

To overcome the limitations of this structure, structures are used with the organization of transits and a mobile cluster head (MCH) that performs the functions of data collection [4]. The use of an MCH allows you to significantly expand the WSN service area, as well as increase network connectivity [5]. If you choose the trajectory of the MCH in such a way that it passes through the communication zone of each of the network nodes for a certain time, then during this time interval, with the right choice of movement speed [6], it is possible to collect data from a given share of the total number of nodes.

In this case, the duration of data collection depends on the speed of the cluster head, the number of network nodes and the size of the service area. The requirements for the duration of data collection depend on the specific application tasks solved by the network. For example, in a network for monitoring ambient temperature and soil moisture, it can be calculated in minutes or tens of minutes, and in a network for monitoring the state of the atmosphere for the presence of hazardous substances, the duration of data collection should be in seconds.

Based on the previous studies, we note that in order to reduce the data collection time, the number of MCHs should be increased, and in this case, the tasks of choosing both the number of such nodes and the nature of their movement in the network service area arise, and the efficiency of using such nodes depends on these parameters. It is obvious that having too many nodes increases the cost of the network, i.e., reduces its effectiveness. The MCH, as a rule, is significantly more expensive than a stationary node, since it requires the availability of vehicles and energy costs for the movement process, as well as an interface with the global network. In the extreme case, it is possible to completely abandon stationary nodes, leaving only MCHs equipped with gateways; this will reduce the data collection time to a minimum. In this case, the network will have the maximum cost. In an intermediate variant, the number of MCHs should be sufficient to provide the required duration of data collection; however, the duration is determined not only by the number, but also by the nature of MCH movement.

Therefore, as a new contribution in this paper, we develop a method for constructing a network with an MCH, allowing for an increase in the efficiency of using WSN resources. The new method allows you to select clusters with different densities of nodes. The identification of such clusters makes it possible to organize data collection with the help of mobile cluster heads, choosing the optimal speed of movement in each of these clusters. The analysis and the simulation show the efficacy of the proposed model. Additionally, it possible to increase the efficiency of using mobile cluster heads, which depends on the heterogeneity of the IoT network.

1.2. Gap Analysis

In the presence of mobile head nodes, the problem arises of selecting clusters, i.e., connections to these nodes by the WSN nodes from which you want to receive data. Both the speed (time) of data collection and the required number of mobile head nodes depends on the solution to this problem.

The task of selecting a cluster can be solved by various methods, from the simplest, for example, to include in the cluster those nodes with which it is possible to establish a connection, to methods that ensure the necessary distribution of nodes among clusters.

In practice, for example, in an urban environment, one has to deal with networks for various purposes, the elements of which are located on various elements of the urban infrastructure. As a rule, the difference between such networks is expressed in geometric dimensions and in terms of the number of nodes, i.e., they tend to have different knot densities. This is obvious if we imagine the density of pedestrians on a sidewalk, the density of cars on a carriageway, or the density of passengers in the passenger compartment of a bus or a train car. Thus, it can be assumed that one of the main criteria for selecting a cluster can be the density of devices.

Considering various infrastructure elements with different densities of nodes, it becomes necessary to describe them by some model. Such a model can be a model of a point process, in this case, a process that has not a uniform, but a multimodal distribution of elements in the service area.

It would be reasonable to note that an element with the same density of nodes may turn out to be too large, and several head nodes may be required to service it. This is already a slightly different task, i.e., the task of ensuring the required performance. It, in turn, may also require the clustering of such an element, but already with the aim of highlighting “similar” clusters. To solve this problem, well-known clustering algorithms, for example, FOREL or k-means, can be used. Since the solution to this problem is quite well known, we do not discuss it in this paper.

The main objective of this work is the selection of clusters or areas with a similar structure for their service by mobile head nodes. The purpose of this work is to increase the efficiency of the process of collecting data from sensor network nodes using mobile head nodes. To provide an efficient solution, a method of selecting clusters with a similar structure that can be served by mobile head nodes is expected.

2. Network Model with Mobile Nodes

It was proved in [7] that the speed of movement of an MCH that collects data depends on the density of nodes and the requirements for the quality of data collection (probability of missing a node). In the same work, it was proved that the optimal value of the speed of movement of such a node can be determined, at which the number of polled network nodes is the maximum. Such a model for organizing the movement of a node is effective with a uniform distribution of users in the service area, when they are described by a Poisson field [8]. In most practical cases, the Poisson field model is applicable only in limited spaces, within which the distribution of network nodes can be considered uniform. In the entire service area, nodes are distributed, in most cases, unevenly. Such a distribution is usually described by multimodal laws [9].

For example, the distribution of people in a city, cars on a roadway, or aquatic plants or fish in a water area. In such cases, the density of objects is uneven in the service area. There is a high density of people in residential, office and public buildings, the density of cars is greater at intersections than in the central part of the road, etc. Therefore, most WSNs also have an uneven density of nodes in the service area. Therefore, when choosing the number of MCHs and the speed of their movement, this feature must be taken into account. Otherwise, the use of an MCH may be ineffective.

When modeling the WSN, we assume that the network nodes are located on a plane, and the service area is a rectangle. This assumption allows us to simplify the reasoning, but it should be noted that all these reasonings can be used for a three-dimensional model as well.

We assume that the network nodes are randomly distributed in the service area. The distribution of node coordinates can be represented by a two-dimensional multimodal distribution, described as a mixed (composite) distribution [10,11,12] with a probability density and a distribution function given by expressions (1) and (2), respectively

f (x, y) = \sum_{i = 1}^{K} η_{i} f_{i} (x, y), F (x, y) = \sum_{i = 1}^{K} η_{i} F_{i} (x, y)

(1)

where K—is the number of modes; η_i is a numerical coefficient such that

0 \leq η_{i} \leq 1, \sum_{i = 1}^{K} η_{i} = 1

; f_i(x, y) is the probability density of the i-th component; f_i(x, y) is the probability distribution function of the i-th component.

In the general case, various distribution functions can be used in (1). Their choice is determined by the degree of closeness of the model to the simulated system. In this model, it is proposed to use continuous random variables and the corresponding distribution functions, since we assume that the x- and y-coordinates can take any value. Both limited and unlimited random variables can be used for modeling. In the latter case, it is necessary to specify the probability of a random variable falling into the simulated service area and to choose the distribution parameters in such a way that this probability is sufficiently large.

Figure 1 shows a possible intersection model obtained using a mixed distribution of the form (1) at K = 4, the components of which are two-dimensional normal distributions [13] with a probability density

f_{i} (x, y) = \frac{1}{2 π σ_{x} σ_{y} \sqrt{1 - c}} e^{- (\frac{{(x - μ_{x})}^{2}}{σ_{x}^{2}} - 2 c \frac{(x - μ_{x}) (y - μ_{y})}{σ_{x} σ_{y}} + \frac{{(y - μ_{y})}^{2}}{σ_{y}^{2}})}

(2)

where μ_x and μ_y are the mathematical expectations of random variables X and Y, respectively; σ_x and σ_y are standard deviations for random variables X and Y, respectively; c is the correlation coefficient between X and Y.

In this case, the simulation considers four road sections adjacent to the intersection, and vehicles located on these sections are considered as objects.

The random variables X and Y are the coordinates of network nodes (for example, cars). The mixed distribution shown in the figure makes it possible to judge the probability p of finding a node in a given area of the considered service area

p = \iint_{Ω} f (x, y) d x d y

(3)

where Ω is the selected area.

Equations (1) and (2) are very useful for the modeling and simulation process because they can be easy calculated in the different simulation systems and other software. Although these distributions may differ from the real ones, they make it easy to check the performance of the model.

Of course, the construction of distributions should be based on statistical data on the number of cars on different sections of the road. The photograph shown in the example is a special case and serves to explain this approach.

The shape (“sharpness” and “height”) of the peaks depends on the standard deviation and coefficients η_i, corresponding to each of the distribution components. The probability value (3) is proportional to the proportion (number) of nodes in a given area. The density of nodes in this model is unevenly distributed in the service area and repeats the probability distribution. The average density of nodes in a given area Ω can be estimated as

\bar{ρ} = \frac{p n}{S (Ω)}

(4)

where

S (Ω)

is the area of the region Ω; n is the total number of nodes in the service area.

As a rule, the environment in which the network is built has a number of structural elements, such as buildings, roads, sidewalks, bridges, tunnels, overpasses, parking lots, etc. These elements determine the features, and possibly the nature of the location of the network nodes. Therefore, to describe arbitrary cases, it may be convenient to use different distributions in various combinations within the framework of model (1). At the same time, the density of users can be different in the areas described by different structural elements.

Comparing the proposed modeling method with practical applications, we can make the following assumptions. For example, when describing a network of nodes placed on vehicles, a car parking area can probably be described by a uniform distribution, an intersection area by a normal distribution, squares and roundabouts by a uniform distribution, and so on. Of course, the choice of distribution should be based on statistical observations.

Considering Figure 1, we can assume that in this example, four clusters can be distinguished, which correspond to four sections of roads adjacent to the intersection. This assumption suggests itself due to the fact that there is a greater number of network nodes (cars) in these sections, and between these sections there is a space where there are relatively few network nodes. Such an intuitive solution is based on the apparently different density of network nodes, which is easy to see from the illustration. It is likely that the maintenance of such a network can be implemented by mobile head nodes, having allocated its own head node for each of the clusters, or by a smaller number, moving head nodes between clusters. In any case, there is a problem of cluster selection, in which the main criterion is the density of network nodes.

3. Formulation of the Problem

Figure 2 shows the results of generating random network nodes based on a model built using a mixed distribution, the components of which are two-dimensional normal and uniform distributions [14].

This model demonstrates several typical elements that may be in the network service area. These are areas of various shapes and with a different distribution of users. If we consider the road network, then areas designated as 1 and 2 can correspond to the area, areas 3 and 4 to intersections, and 5, 6 and 7 to sections of roads and parking lots. We call such areas structural elements.

To build an efficient data collection system using a mobile cluster head, it is necessary to choose the motion parameters in each of these areas appropriately. To do this, first of all, it is necessary to be able to select these areas in the initial data set, i.e., the selection of user groups and determination of geometric (or geographical) coordinates necessary to select the parameters of the movement of nodes.

As mentioned above, to select traffic parameters, the primary task is to select areas that are homogeneous in terms of the location of network nodes. Analyzing Figure 1, it is quite easy to intuitively identify seven areas (structural elements). However, firstly, the situation is far from always being so obvious, and secondly, a formal method is needed on the basis of which this problem can be solved.

Formally, this problem can be described as a classification problem or a clustering problem [15,16]. When it comes to matching network nodes to given structural elements, this task is a classification task. In the case when several groups of nodes can be allocated within the framework of one structural element, and the goal is to select groups of nodes and not just their comparison with structural elements, this task is a clustering task. The application of clustering methods to communication problems is described, for example, in [17,18].

Let the total number of nodes n of the network be the set

S = {s_{1}, \dots, s_{n}}

in which it is necessary to select subsets

S_{1}, \dots S_{k}

such that their intersection is equal to the empty set, while the power of the complement of the set S before the union of the sets M_i is much less than the power of the set S.

(M_{1}, \dots M_{k}) \in S

(5)

⋂_{i = 1}^{k} M_{i} = \emptyset

(6)

⋃_{i = 1}^{k} M_{i} = B

(7)

S \ B = a

(8)

| a | < < | S |

(9)

In practice, expressions (5)–(9) can be interpreted as the selection of disjointed subsets in the initial set of nodes k, such that they include almost all nodes of the network, with the exception of some of the nodes included in set a. These expressions should be interpreted as a formal mathematical description of this clustering problem. Expression (6) shows that the result should be disjointed sets. Expression (7) shows that all clusters obtained as a result of clustering contain a set of elements B (all elements from the found clusters). Expression (8) shows that set B (elements of all found clusters) does not contain some elements of the original set, i.e., set a. Expression (9) says that the overwhelming majority of the elements of the original set are included in the found clusters; only the elements of set a, which are considered as noise, are not included.

We will assume that the movement can be described by the speed and trajectory (route) of the node. In [7], a model was obtained that relates the speed of the movement of nodes with the quality of service and the density of network nodes.

Under the conditions of the task set, clustering should satisfy not only conditions (6)–(9), but the cluster should also be, if possible, homogeneous in terms of the distribution of nodes. The most appropriate criterion for this problem is the density of nodes, since it is this parameter that determines the choice of motion parameters for mobile nodes.

In a real situation, it is impossible to create “absolutely impenetrable boundaries” for network nodes (cars, pedestrians, animals, etc.), as well as absolutely reliable devices. Therefore, it should be expected that a certain number of nodes may be located outside the structural elements (roads, buildings, water bodies, etc.) and not perform the functions assigned to them. The coordinates of such nodes can be present in the source data, which creates additional difficulties in solving the problem. To be able to exclude such nodes from the problem, condition (9) is introduced, which admits that not all n nodes should be included in the allocated k clusters. By analogy with other random processes, we call the presence of such nodes noise.

As an additional condition, we choose the difference between the node densities in the selected clusters.

| ρ_{i} - ρ_{j} | \geq ε, i, j = 1, \dots, n, i \neq j

(10)

where ρ_i and ρ_j are the density of network nodes in the corresponding clusters; ε is the allowable change in the density of nodes.

The solution to the clustering problem makes it possible to select the regions and the speeds of movement of the cluster head

v_{1}, \dots v_{k}

within them, for example, using the method described in [19]. The next task is to select the trajectories of the cluster heads within the boundaries of the selected clusters.

4. Method for Selecting the Motion Parameters of the Cluster Head

We assume that the movement can be described by the speed and trajectory (route) of the node. In [19], a model was obtained that makes it possible to select the trajectory of motion based on the use of the FOREL algorithm for clustering network nodes according to the node connection radius criterion and subsequent routing through the centers of the found clusters by approximately solving the traveling salesman problem.

In this case, this method does not lead to the desired results, because the criterion for selecting clusters does not take into account the influence of the distribution of nodes on the features of motion.

Figure 3 shows the result of clustering an example (Figure 2) using the k-means (k-means) algorithm; in the example, k = 7. Objects assigned to different clusters are highlighted in color. The above example clearly shows that as a result of clustering, areas are identified that do not coincide with structural elements. One structural element can be divided into several clusters, and one cluster can include parts of different elements. This result of the k-means algorithm is expected, because this algorithm does not take into account the peculiarities of node placement.

This algorithm was chosen only as an example of clustering that is unsuccessful for this problem; many other clustering algorithms yield similar results.

To select a suitable clustering algorithm, experiments were carried out with 11 known algorithms: MiniBatch k-means, FOREL, Affinity Propagation, Mean Shift, Spectral Clustering, Ward, Agglomerative Clustering, DBSCAN, OPTICS, BIRCH, and Gaussian Mixture. The algorithms were compared based on a number of examples in terms of the possibility of selecting clusters according to the task, i.e., within structural elements, as well as by execution time. The execution time for calculations was obtained experimentally. During the experiment, a computer with an 8-core 4 GHz processor with a memory capacity of 16 GB was used. As examples with different parameters, the initial data used to test and diagnose such algorithms from the library [20] were generated: “nested circles”, “two moons”, “drops”, and uniform distribution. Such a set of initial data presents certain difficulties for clustering problems and allows us to describe the capabilities of the algorithms under study. Table 1 shows the results of the study of clustering algorithms.

The estimate of the average execution time t_e was obtained by averaging over a set of six test cases, the same for all algorithms.

t_{e} = \frac{1}{n} \sum_{i}^{n} t_{i}, n = 6

(11)

The runtime variation K was calculated as

K = \frac{1}{{\bar{t}}_{e}} \sqrt{\frac{1}{n - 1} \sum_{i}^{n} {(t_{i} - {\bar{t}}_{e})}^{2}}, n = 6

(12)

where t_i is the processing time for the i-th experiment; n is the number of experiments. K characterizes the variability of processing time during a series of n experiments. This variability is expressed relative to the average value of this time, which gives an idea of the relative deviations of the processing time.

As the results show, four algorithms yielded results close to expectations, such as Spectral Clustering, Agglomerative Clustering, Optics, and DBSCAN. They allow you to select structural elements, but only two of them make it possible to take noise into account, which are OPTICS and DBSCAN.

The analysis of the considered algorithms showed that the DBSCAN (Density-Based Spatial Clustering of Applications with Noise) algorithm met the requirements of the task to the greatest extent. The name of the algorithm, “Spatial Clustering of Noisy Applications based on Density” quite fully reflects its functionality, which also meets the requirements of the problem being solved.

The execution time of the DBSCAN algorithm was almost six times less than the time required for the OPTICS algorithm. In addition, it had a relatively small coefficient of variation in execution time compared to other algorithms, which can also be regarded as a positive property (a small spread in execution time for various tasks).

The DBSCAN algorithm was developed in 1996 by Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu, and was published in [21]. In 2014, the algorithm was awarded the Time-Tested award at the KDD Data Mining Conference [22]. Today it is the most efficient and frequently used algorithm in clustering and classification problems.

Figure 4 shows the result of initial data clustering by the DBSCAN algorithm.

The DBSCAN algorithm allows you to select clusters based on the density of nodes. This allows you to select groups of nodes within the boundaries of structural elements. In the above example, groups of nodes were identified in all given structural elements (seven clusters). Additionally, a group of nodes located at a distance from the areas of structural elements, which were defined as “noise”, was identified.

A brief description of this algorithm is as follows. All initial objects (within the framework of the problem, these are nodes; in the description of the algorithm, we call them points) are divided into three types: core points, density reachable, and discarded or noise (outliers).

A point p is called basic if its neighborhood of radius r contains at least m points (including the point p itself). We say that all these points are directly accessible from p. The ratio

ρ = m / (π r^{2})

expresses the density of objects in the vicinity of the main point.

A point q is reachable from a point p by density if it is possible to lay a route between them on a graph built from the main points. An edge in a graph exists when the distance between points does not exceed r.

All points that are not reachable from other points are called outliers or noise points. The algorithm allows you to select clusters in the area of clusters of initial objects, while the conditional boundaries of the cluster are determined by the inaccessibility of neighboring points, i.e., low density of objects between clusters.

If a point is found to be a dense part of a cluster, then its ε-neighborhood is also part of that cluster. Hence, all points that are found within the ε-neighborhood are added, as are their own ε-neighborhoods when they are also dense. This process continues until the density-connected cluster is completely found. Then, a new unvisited point is retrieved and processed, leading to the discovery of a further cluster or noise.

DBSCAN can be used with any distance function [1,4] (as well as similarity functions or other predicates) [7]. The distance function (dist) can therefore be seen as an additional parameter. Algorithm 1 may be described in pseudocode as shown below.

Algorithm 1. Clustering by the DBSCAN

DBSCAN(DB, distFunc, eps, minPts) {
        C : = 0                                                                                                  /* Cluster  counter  */
        for  each  point  P  in  database  DB  {
              if  label(P)  ≠  undefined  then  continue                            /* Previously  processed  in  inner  loop  */
              Neighbors  N : = RangeQuery(DB,  distFunc,  P,  eps)          /* Find  neighbors  */
              if  |N| < minPts  then {                                                             /* Density  check  */
                  label(P) : = Noise                                                              /* Label  as  Noise  */
                continue
            }
              C : = C + 1                                                                                  /* next  cluster  label  */
              label(P) : = C                                                                            /* Label  initial  point  */
              SeedSet  S : = N  \  {P}                                                            /* Neighbors  to  expand  */
              for  each  point  Q  in  S {                                                      /* Process  every  seed  point  Q  */
              if  label(Q)  =  Noise  then  label(Q) : = C                    /* Change  Noise  to  border  point  */
              if  label(Q) ≠ undefined  then  continue                        /* Previously  processed  (e.g.,  border  point)  */
              label(Q) : = C                                                                      /* Label  neighbor  */
              Neighbors  N : = RangeQuery(DB,  distFunc,  Q,  eps)/* Find  neighbors  */
              if  |N| ≥ minPts  then {                                                  /* Density  check  (if  Q  is  a  core  point)  */
                    S : =  S ∪ N                                                                   /* Add  new  neighbors  to  seed  set  */
              }
          }
      }
}

To solve the problem, it is necessary to choose a value of density ρ at which the selected clusters satisfy the conditions of the problem, i.e., the identified clusters are homogeneous and are located within the boundaries of the structural elements.

Based on the logic of the algorithm, it can be seen that at the boundary values of the density, i.e., at ρ

\to 0

, one cluster will be selected, including all elements, while there will be no noise; at ρ

\to \infty

, zero clusters will be formed, i.e., all elements will be treated as noise. At intermediate values, the number of clusters will increase as the density increases. Exploring the operation of the algorithm, if we set the density value based on the radius r, then the minimum number of nodes m will be considered constant. For analysis, we will count the number of discarded elements (noise) as r changes.

Figure 5a shows the results obtained with multiple clustering of a random sample with different radius values.

Figure 5b shows the dependence of the proportion of discarded elements on the radius value, and the statistical data are marked with circles. To approximate the obtained dependence, a Gaussoid was used (blue curve).

O (r) = \frac{1}{c \sqrt{2 π}} e^{- \frac{{(r - r_{0})}^{2}}{2 c^{2}}}

(13)

The curve parameters c and r₀ (13) were chosen by the least-squares method.

To choose the value of r, we used the cubit point method [23] (Figure 5b), i.e., the inflection points of the curve at which the angle of inclination of the tangent to it is equal to π/4. This method is often used in clustering problems, for example, to select the number of clusters and use the k-means method. For a Gaussoid, this point is equal to the parameter c.

In this case, several trial clustering operations are performed, the number of discarded elements is calculated for them, and then the position of the cubital point is determined.

The next subtask when using DBSCAN is the choice of the density value, for which it is necessary to set the minimum number of nodes in the vicinity of the main points m. The algorithm cannot select clusters with different element densities; areas with a density significantly lower than the specified one will be defined as noise. Therefore, if a different density of users is expected in the service area, then it is necessary to organize a series of clustering stages for different values of it.

The algorithm of the proposed method is shown in Figure 6.

We describe this algorithm step by step.

The following notation is used in the algorithm: S is the initial set of objects, r_min and r_max are the minimum and maximum node density, n_min is the minimum number of nodes in the cluster, delta is the step of changing the node density, r₀ is the node density value found by the cubital point method and is the cycle counter.

Step 1. Input of initial data, i.e., set of coordinates of network nodes S and initialization of the cycle counter i, as well as an array of boundary values r and n_min.

Step 2. Choice of the i-th set of boundary values r and n_min.

Step 3. The initial value of r is set (minimum, which corresponds to the maximum density).

Step 4. The DBSCAN algorithm is being executed.

Step 5. Verification of whether calculations have been performed for the entire range of r values. If not, then go to step 6; if yes, then go to step 7.

Step 6. Increase r by Δ.

Step 7. This step is performed after executing the DBSCAN algorithm for a number of r values. Based on the clustering results, the percentage of dropped nodes is estimated, and r₀ is estimated using the cubital point method.

Step 8. This step is performed after obtaining the value r₀, for which the DBSCAN algorithm is performed. It should be noted that the clustering result for the value r₀ could be obtained in steps 4–6, in which case this step is a formality.

Step 9. Checking whether all sets of parameters r and n_min have been considered. If not, then go to step 10, if yes, then go to step 11.

Step 10. Incrementation of counter i by one, i.e., selection of the next set of parameters. If yes, then go to the next step.

Step 11. At this step, the trajectory of the cluster head is selected.

The last step includes the task of choosing the trajectory of the node. Note that within the framework of this article, tasks were considered whose application area was urban infrastructure and such structural elements as elements of roads, rears, and structures. The geometric dimensions of such objects, as a rule, do not exceed the communication zone of the cluster head (hundreds of meters). Therefore, the solution to this problem is obvious in most cases, for example, movement along the center line of the road, along the sidewalk border, along the wall of a building, along a corridor, etc.

The solution to this problem is expedient when the sizes of the obtained clusters and structural elements of the environment significantly exceed the communication zone of the cluster head, for example, agricultural land, water areas, forests, etc. These problems are the subject of further research.

We only note that a possible method for solving it can be, for example, the method proposed in [19]. In such a case, this method must be applied for each of the resulting clusters.

The algorithm described above was implemented in Python and tested on several network models, in particular, on the model described at the beginning of the article. The testing of the algorithm showed its efficiency in the selection of clusters, including those with different node densities.

5. Method Effectiveness

The efficiency of the proposed method can be estimated using the results obtained in [7]. We proceed based on the fact that the operation of the MCH is characterized by two main parameters: the time of movement and the number of polled network nodes as shown in Figure 7.

We also assume that the mobile cluster head passes through two regions (two clusters) with a high and low density of nodes, ρ₁ and ρ₂, respectively. In this case, we consider two cases: in the first case, the speed of the node is selected based on the average value of the density, i.e., for

(ρ_{1} + ρ_{2}) / 2

, and in the second case, it is chosen individually for each of the regions.

To select the speed, we use the method described in [7]. We evaluate the efficiency by the coefficient of change in the number of polled nodes when using the proposed method.

e = \frac{η_{1} (ρ_{1}, v_{2}) + η_{2} (ρ_{2}, v_{2}) - η_{1} (ρ_{1}, \tilde{v}) - η_{1} (ρ_{1}, \tilde{v})}{η_{1} (ρ_{1}, \tilde{v}) + η_{1} (ρ_{1}, \tilde{v})}

(14)

Naturally, if ρ₁ = ρ₂, then the efficiency should be equal to zero. The method is effective for a heterogeneous network. Figure 8 shows the dependence of the efficiency estimate according to expression (14) on the heterogeneity of the network, characterized by the ratio of node densities in the first and second regions

d ρ = ρ_{1} / ρ_{2}

. As can be seen from the figure, the efficiency increases with the increase in the difference in node densities. So, with twice the difference in densities in the first and second regions, the increase in the number of serviced nodes is about 20%.

The proposed performance evaluation model is a special case. The shapes of the resulting clusters, their sizes, and the ratio of the number of nodes in them can be different. However, this model reflects the main property of the proposed method for choosing the speed of the cluster head based on the density of network nodes.

An analysis of the dependence of the method efficiency on the heterogeneity of the distribution of nodes shows that its minimum value is zero, which corresponds to the case of a homogeneous network. In this case, the method does not work. The growth of heterogeneity leads to an increase in the number of serviced nodes and the efficiency of the method, respectively. In this case, it should be taken into account that with an unlimited increase in the density of nodes, it will be necessary to reduce the speed of the node movement to almost zero, which will lead to the degeneration of the problem. Therefore, this model is applicable for real densities, which in most cases differ by no more than an order of magnitude.

Experimental set up and its environment. All conducted experiments were carried out on a simulation model. A personal computer was used as the hardware platform. The software consisted of the Windows 10 operating system installed with Matlab, Python 3 and the necessary libraries. The network model shown in Figure 1 was built in Matlab. Python was used to analyze clustering algorithms and implement the proposed method. A code was written that implements the models and algorithms described above.

6. Conclusions

The IoT network has a heterogeneous structure, in which it is possible to distinguish areas determined by the structural elements of the environment, such as roads, sidewalks, tunnels, buildings, etc. The density of the network within these elements can be different. The main task of organizing the movement of an MCH is to identify groups of network nodes within the boundaries of structural elements. To solve this problem, a method for selecting clusters based on the DBSCAN clustering algorithm was developed. The method allows you to select groups (clusters) of network nodes with different densities of nodes. The identification of such clusters makes it possible to organize data collection with the help of mobile cluster heads, choosing the optimal speed of movement in each of these clusters. The developed method makes it possible to increase the efficiency of using mobile cluster heads, which depends on the heterogeneity of the Internet of Things network. The efficiency, determined by the increase in the number of nodes served, is higher with greater network unevenness, as determined by the density of nodes in different clusters.

In the future we are planning to find and develop effective methods for choosing the trajectory of movement of the head nodes of clusters and their speed.

Author Contributions

Conceptualization, D.V., O.A.M. and A.A.A.E.-L.; methodology, A.M., A.P., and A.A.A.E.-L.; software, A.P.; validation, M.E.; formal analysis: A.M., A.K., A.P. and A.A.A.E.-L.; investigation, O.A.M., G.A., M.E. and A.A.A.E.-L.; resources, A.M., O.A.M., A.P. and A.A.A.E.-L.; data curation, G.A., M.E. and A.A.A.E.-L.; writing—original draft preparation, D.V., A.M., A.P. and A.A.A.E.-L.; writing—review and editing, A.P. and A.A.A.E.-L.; visualization, A.M., A.P. and A.A.A.E.-L.; supervision, A.K.; project administration, A.K.; funding acquisition, A.A.A.E.-L. All authors have read and agreed to the published version of the manuscript.

Funding

The authors would like to acknowledge the support of Prince Sultan University for paying the Article Processing Charges (APC) of this publication.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data will be available based on reasonable request to the corresponding authors.

Acknowledgments

Authors would like to thank, EIAS Data Science Lab, College of Computer and Information Sciences, Prince Sultan University, Riyadh, Saudi Arabia, for the support of this work.

Conflicts of Interest

The authors declare no conflict of interest.

References

Alsbouí, T.; Hammoudeh, M.; Bandar, Z.; Nisbet, A. An overview and classification of approaches to information extraction in wireless sensor networks. In Proceedings of the 5th International Conference on Sensor Technologies and Applications (SENSORCOMM’11), Nice, France, 21–27 August 2011; p. 255. [Google Scholar]
Asim, M.; Mashwani, W.K.; Abd El-Latif, A.A. Energy and task completion time minimization algorithm for UAVs-empowered MEC SYSTEM. Sustain. Comput. Inform. Syst. 2022, 35, 100698. [Google Scholar] [CrossRef]
Futahi, A.; Futahi, A.; Koucheryavy, A.; Paramonov, A.; Prokopiev, A. Ubiquitous Sensor Networks in the Heterogeneous LTE Network. In Proceedings of the 17th International Conference on Advanced Communications Technology (ICACT), PyeongChang, Republic of Korea, 1–3 July 2015; pp. 28–32. [Google Scholar]
Muthanna, M.S.A.; Muthanna, A.; Nguyen, T.N.; Alshahrani, A.; El-Latif, A.A.A. Towards optimal positioning and energy-efficient UAV path scheduling in IoT applications. Comput. Commun. 2022, 191, 145–160. [Google Scholar] [CrossRef]
Popoola, S.I.; Adebisi, B.; Hammoudeh, M.; Gui, G.; Gacanin, H. Hybrid deep learning for botnet attack detection in the internet-of-things networks. IEEE Internet Things J. 2020, 8, 4944–4956. [Google Scholar] [CrossRef]
Kovalenko, V.; Alzaghir, A.; Volkov, A.; Muthanna, A.; Koucheryavy, A. Clustering algorithms for UAV placement in 5G and Beyond Networks. In Proceedings of the 2020 12th International Congress on Ultra Modern Telecommunications and Control Systems and Workshops (ICUMT), Brno, Czech Republic, 5–7 October 2020; pp. 301–307. [Google Scholar]
Paramonov, A.; Khayyat, M.; Chistova, N.; Muthanna, A.; Elgendy, I.A.; Koucheryavy, A.; Abd El-Latif, A.A. An Efficient Method for Choosing Digital Cluster Size in Ultralow Latency Networks. Wirel. Commun. Mob. Comput. 2021, 2021, 9188658. [Google Scholar] [CrossRef]
Choudhary, S.; Sugumaran, S.; Belazi, A.; El-Latif, A.A.A. Linearly decreasing inertia weight PSO and improved weight factor-based clustering algorithm for wireless sensor networks. J. Ambient. Intell. Humaniz. Comput. 2021, 1–19. [Google Scholar] [CrossRef]
Chaaf, A.; Muthanna, M.S.A.; Muthanna, A.; Alhelaly, S.; Elgendy, I.A.; Iliyasu, A.M.; Abd El-Latif, A.A. Revohpr: Relay-based Void Hole Prevention and Repair by Virtual Routing in Clustered Multi-AUV Underwater Wireless Sensor Network. Preprints 2021, 2021060514. [Google Scholar] [CrossRef]
Mixture Distribution. Available online: https://en.wikipedia.org/wiki/Mixture_distribution (accessed on 24 August 2021).
Röver, C.; Friede, T. Discrete Approximation of a Mixture Distribution via Restricted Divergence. J. Comput. Graph. Stat. 2017, 26, 217–222. [Google Scholar] [CrossRef] [Green Version]
Ghojogh, B.; Ghojogh, A.; Crowley, M.; Karray, F. Fitting A Mixture Distribution to Data: Tutorial. arXiv 2020, arXiv:1901.06708v2. [Google Scholar]
Multivariate Normal Distribution. MathWorks. Available online: https://www.mathworks.com/help/stats/multivariate-normal-distribution.html (accessed on 10 December 2022).
Albert, W. Marshall A Bivariate Uniform Distribution; Stanford University Technical Report; Springer: New York, NY, USA, 1989. [Google Scholar]
Gimadinov, R.F.; Muthanna, A.S.; Koucheryavy, A.E. Clustering in mobile network 5G based on partial mobility. Telecom. IT 2015, 2, 44–52. [Google Scholar]
Ateya, A.A.; Muthanna, A.; Vybornova, A.; Koucheryavy, A. Multi-level Cluster Based Device-to-Device (D2D) Communication Protocol for the Base Station Failure Situation. Lect. Notes Comput. Sci. 2017, 10531, 755–765. [Google Scholar]
Paramonov, A.; Hussain, O.; Samouylov, K.; Koucheryavy, A.; Kirichek, R.; Koucheryavy, Y. Clustering Optimization for Out-of-Band D2D Communications. Wirel. Commun. Mob. Comput. 2017, 2017, 6747052. [Google Scholar] [CrossRef] [Green Version]
Asim, M.; Abd El-Latif, A.A. Intelligent computational methods for multi-unmanned aerial vehicle-enabled autonomous mobile edge computing systems. ISA Trans. 2021, 132, 5–15. [Google Scholar] [CrossRef] [PubMed]
Asim, M.; ElAffendi, M.; Abd El-Latif, A.A. Multi-IRS and Multi-UAV-Assisted MEC System for 5G/6G Networks: Efficient Joint Trajectory Optimization and Passive Beamforming Framework. IEEE Trans. Intell. Transp. Syst. 2022, 1–12. [Google Scholar] [CrossRef]
Scikit-Learn Machine Learning in Python. Available online: https://scikit-learn.org/stable/index.html (accessed on 24 August 2021).
Ester, M.; Ester, M.; Kriegel, H.; Sander, J.; Xu, X. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In Proceedings of the KDD’96: The Second International Conference on Knowledge Discovery and Data Mining, Portland, OR, USA, 2–4 August 1996; pp. 226–231. [Google Scholar]
2014 SIKDD Test of Time Award Winners. Available online: https://www.kdd.org/awards/view/2014-sikdd-test-of-time-award-winners (accessed on 24 August 2021).
Syakur, M.A.; Khotimah, B.K.; Rochman EM, S.; Satoto, B.D. Integration K-Means Clustering Method and Elbow Method for Identification of The Best Customer Profile Cluster. IOP Conf. Ser. Mater. Sci. Eng. 2018, 336, 012017. [Google Scholar] [CrossRef] [Green Version]

Figure 1. Network model within the boundaries of the intersection based on two-dimensional normal distributions.

Figure 2. Network model within the boundaries of an intersection based on two-dimensional normal and uniform distributions.

Figure 3. An example of k-means clustering.

Figure 4. The result of clustering by the DBSCAN algorithm.

Figure 5. Determination of the value of r. (a) The results obtained with multiple clustering of a ran-dom sample with different radius values. (b) The dependence of the proportion of discarded elements on the radius values.

Figure 6. Algorithm of the method (process) for selecting the motion parameters of the mobile head units.

Figure 7. MCH movement through two areas with different densities of network nodes.

Figure 8. Dependence of efficiency on network heterogeneity.

Table 1. Results of the study of clustering algorithms.

N	Algorithm	Structural Element Selection	Isolation of “Noise”	Average Execution Time, s	K. Runtime Variations
1	MiniBatch k-means	-	-	0.06	1.21
2	FOREL	-	-	0.04	0.57
3	Affinity Propagation	-	-	8.31	0.42
4	Mean Shift	-	-	0.14	0.29
6	Spectral Clustering	Yes	-	0.54	0.69
6	Ward	-	-	0.31	0.97
7	Agglomerative Clustering	Yes	-	0.26	0.96
8	OPTICS	Yes	Yes	1.16	0.03
9	BIRCH	-	-	0.05	0.26
10	Gaussian Mixture	-	-	0.01	0.35
11	DBSCAN	Yes	Yes	0.02	0.49

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Vorobyova, D.; Muthanna, A.; Paramonov, A.; Markelov, O.A.; Koucheryavy, A.; Ali, G.; ElAffendi, M.; Abd El-Latif, A.A. IoT Network Model with Multimodal Node Distribution and Data-Collecting Mechanism Using Mobile Clustering Nodes. Electronics 2023, 12, 1410. https://doi.org/10.3390/electronics12061410

AMA Style

Vorobyova D, Muthanna A, Paramonov A, Markelov OA, Koucheryavy A, Ali G, ElAffendi M, Abd El-Latif AA. IoT Network Model with Multimodal Node Distribution and Data-Collecting Mechanism Using Mobile Clustering Nodes. Electronics. 2023; 12(6):1410. https://doi.org/10.3390/electronics12061410

Chicago/Turabian Style

Vorobyova, Darya, Ammar Muthanna, Alexander Paramonov, Oleg A. Markelov, Andrey Koucheryavy, Gauhar Ali, Mohammed ElAffendi, and Ahmed A. Abd El-Latif. 2023. "IoT Network Model with Multimodal Node Distribution and Data-Collecting Mechanism Using Mobile Clustering Nodes" Electronics 12, no. 6: 1410. https://doi.org/10.3390/electronics12061410

APA Style

Vorobyova, D., Muthanna, A., Paramonov, A., Markelov, O. A., Koucheryavy, A., Ali, G., ElAffendi, M., & Abd El-Latif, A. A. (2023). IoT Network Model with Multimodal Node Distribution and Data-Collecting Mechanism Using Mobile Clustering Nodes. Electronics, 12(6), 1410. https://doi.org/10.3390/electronics12061410

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

IoT Network Model with Multimodal Node Distribution and Data-Collecting Mechanism Using Mobile Clustering Nodes

Abstract

1. Introduction

1.1. General Provisions

1.2. Gap Analysis

2. Network Model with Mobile Nodes

3. Formulation of the Problem

4. Method for Selecting the Motion Parameters of the Cluster Head

5. Method Effectiveness

6. Conclusions

Author Contributions

Funding

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI