1. Introduction
Mobile devices have acquired greater capacity for computing, storage, and connectivity to heterogeneous networks. Nevertheless, a critical point of mobility is the network physical change, and the problems inherent to the loss of continuity of services, as well as the decision to change, that is, select the future network to be connected [
1].
The handover or handoff in IP networks is the physical transition from one network to another. The handover is typified by two types of transition: (A) horizontal handover and (B) vertical handover [
2]. When a mobile node (MN) changes network in the same technology, it performs a horizontal handoff. However, if the change of network is to a different kind of technology, then it is called a vertical handoff.
The general procedure of change is classified into three stages according to [
3,
4]:
- (1)
Measurement of handover and initialization.
- (2)
Decision of handover.
- (3)
Execution of handover.
In the first stage, the mobile node (MN) takes measurements of the metrics of the nearby networks. In the second stage, algorithms decide when and which networks to change. Finally, in the third stage, the necessary procedures are made to connect to the new network and reestablish services.
Regarding problems related to the execution of the handover, IP establishes that for all equipment that operates within a network, its address must be derived from this network. Under this scheme, if a computer moves from its original network to a new network, then it will experience the following problems:
- (1)
All communication becomes impossible because its IP address is not valid in the new network.
- (2)
Communications in progress are lost.
- (3)
Mobile nodes disappear from the global network.
The problems related to the handover decision arise in the design of an algorithm that determines the right moment to change networks, because it is possible that the decision-making algorithm is forced by the MN to switch successively between adjacent networks. Although the MN is not in movement, this is because the change of network algorithm is based on a measurement of the received signal strength (RSS) and the MN has a very similar received signal strength indicator (RSSI) between both adjacent networks.
This is known as the ping-pong effect [
5]. This is mainly generated when the decision-making algorithm is based on selecting the network with the highest received signal strength (RSSI). Thus, there are two networks that have a similar power, and the algorithm makes the change to the strongest signal recurrently. This ping-pong effect is a very common phenomenon that causes degradation in the quality of services. Besides, the ping-pong effect is very common not only in network exchange, but it occurs in various areas, and a very common strategy to combat it is to establish limits by ranges. For example, if the frontier is 6, it does not go to the next level until it has a power of 6.5 or higher, and it does not return to the lower level until it is 5.5 or lower. In this way, this border around 6 is blurred, and it reduces that ping pong effect. Another wall is through algorithms such as fuzzy logic [
6,
7,
8,
9], where the idea is very similar. In other words, they use these policies to make change decisions and at the same time try to balance the load of the networks efficiently, using input parameters such as RSSI, latency, and data rate.
Generally speaking, the decision-making algorithm is maintained with the data provided by the network and after processing the data, and it decides which network to switch to. They are usually changing between the wireless local area network (WLAN), universal mobile telecommunications system (UMTS), long term evolution (LTE), among others. Multiple proposals use artificial intelligence techniques, such as artificial neural networks (ANN), fuzzy logic (FL), genetic algorithms (GA), decision trees (DT), among others [
10,
11,
12,
13]. In [
14], the authors proposed an algorithm which optimizes the feature boundary of deep convolutional neural networks (CNN) in order to reduce the overfitting problem, and this is a strategy to deal with the two-stage training process. In [
15], the authors proposed an algorithm to estimate the depth map and faces from monocular 2D images. This algorithm helps to deal with the difficulties such as accessibility, cost, and privacy of data. In [
16], an algorithm is formulated and evaluated to determine the facial diagnosis which uses and a small dataset. Similarly, [
17] gives an overview related to reinforcement learning algorithm for handover management in 5G ultra-dense small cell (UDSC) Networks. In this study, a variety of artificial intelligence algorithms is presented, and the authors give future directions and challenges for 5G UDSC networks. In [
18], a data augmentation framework is formulated to improve the accuracy of deep convolutional neural networks that helps to reduce the training cost and in the same way improve its generalization ability. A profound analysis of decision-making algorithms is carried out in the related works section, although there are few works that consider the prognosis of MN movement.
This research focuses on proposing a framework to reduce packet loss during hand-over, forecasting the movement of the mobile node using classification techniques. Therefore, to anticipate the network processes, it is necessary to carry out the handover, generating a pro-active proposal and avoiding packet loss in mobile WLAN environments. Thus, the main contribution of this framework proposed is maintained the communication on track by anticipating the network switch in order to ensure continuity of services.
The main challenge is to delimit the area to train the classifiers, i.e., a lot of data generates a longer training time, and it is possible that the model will never fit. This challenge will be analyzed in future research.
This work is organized as follows: in
Section 2, a state-of-the-art review of related works is presented, in
Section 3 the problem of selecting the next network to connect a mobile device is described, in
Section 4 a proactive cross-layer framework design based on artificial intelligence techniques for seamless handover in mobile WLAN environments is proposed and formulated. In
Section 5, the results are presented and discussed. Finally, the conclusions are given.
2. Related Works
The different proposals can be found divided into two areas: (1) The proposals to improve handover protocols and (2) The decision-making proposals.
In the first section, you can find protocols that focus on reducing the scanning time such as [
19,
20,
21]; on the other hand, there are proposals that focus on reducing the handover time such as [
22,
23]. This work focuses on improving decision-making, that is, in point 2.
Neural networks in the handover have been applied to solve decision-making changes in heterogeneous networks. In [
24], the authors propose a back-propagation artificial neural network (ANN) using the RSS input parameters and the traffic intensity in the target networks, monitoring the training of the network. However, the delay caused by the training stage is a problem. In [
25], a middleware based on an ANN is proposed to select the best network based on user preferences. Furthermore, it increases latency during handover execution due to the size of the signaling packets used and the time of training. Another proposal based on the use of RN is made in [
26] where a neural network with RSSI input parameters and the speed of the MN is proposed, reducing the number of unnecessary handovers.
Vertical handover algorithms involve several factors that can make difficult the quantification. Fuzzy logic can be applied to solve change decision problems like [
7,
8,
9], [
27] that use these policies to make decision changes, and at the same time they try to balance the load of the networks efficiently, using input parameters such as RSSI, latency, and data rate.
The trend in reviewed literature shows that the networks will be more heterogeneous, and in this way, a vertical handover will be more normal. Some of the reviewed algorithms are classified based on the main criteria used to make the decision.
Regarding the decision-making proposals, multiple algorithms have been proposed. These use schemes based on received signal strength (RSS), quality of service (QoS), decision functions based on multi-criteria, and algorithms based on artificial intelligence techniques. In general terms, the decision-making algorithm feeds on the data provided by the network. After processing the data, it decides which network to switch to. Some of the criteria commonly used to make this decision to change are illustrated in the following topic: network parameters.
Network Parameters
The network parameters or criteria frequently used in handover decision-making are mentioned below, along with their description in
Table 1.
In addition, some user preferences such as the cost of the network and security are considered.
In
Figure 1, a generic algorithm to the handover decision phase is shown, which is basically a representation of the input data that is passed to the algorithm in order to make a change decision [
19], [
28,
29,
30,
31,
32]. That is, select the next network. This work focuses on the movement of the mobile node to predict the next network to connect. For this reason, this article proposes parameters such as position, speed, acceleration, and power of the received signal (RSS) as features.
The algorithms that make the decision to change based on artificial intelligence are associated with high algorithmic complexity. Therefore, in this paper, different criteria are proposed to perform a network change’s forecast based on classification techniques. In this context, the next sub-section presents the problem to be solved after the simulation of the proactive cross-layer protocol based on artificial intelligence techniques for seamless handover in mobile WLAN environments.
3. The Problem of Selecting the Next Network
The problem of selecting the next network can be formulated as three sets as follows: given a set R of candidate networks to carry out the handover, consisting of the triple {Extended Service Set Identifier (ESSID), Received Signal Strength Indicator (RSSI), Mac Address (MA)}, that is, let R = {(ESSID1, RSSI1, MA1), …, (ESSID|R|, RSSI|R|, MA|R|)}, |R| ≥ 2, and another GPS set formed by the position of the MN in decimal degrees constituted by the triplet {latitude(lat), longitude(lon), and height(alt)}. That is, let GPS = {(lat1,lon1,alt1), …, (lat|N|,lon|N|,alt|N|)}, |N| ≥ 2, and finally the set AV consisting of the acceleration in the three axes, x-axis(ax), y-axis(ay), and z-axis(az), and the velocity (v) AV = {(ax1,ay1,az1,v1),…, (ax|A|,ay|A|,az|A|,v|A|)}, where |A| ≥ 2. The solution is modeled with the set , i = 1,..,R and t > 0.
Then the data set is expressed as follows:
= {lat,lon,alt, ax,ay,az,v, ESSID, RSSI, MA}
That is, at time t = 1, there are three networks available to perform the network change.
= {19.41356, −98.90195, 2242, 1.41617, 3.84269, 7.96311, 5.01, Net1, −78, e4:3e:d7:26:cd:a7, Net2, −78, 48:8d:36:3d:04:b8, Net3, −85, 94:8f:cf:84:67:32}
Moreover, at time t = 2, there are two networks available to perform the network change.
= {19.491490, −98.892851, 2267, 5.75562, −0.153, 0.306, 10.726, INFINITUM68xx, −78, 50:4e:dc:2b:fe:18, INFINITUMr5xz, −78, 7c:b1:5d:5e:4e:f8}
The data set D is variable depending on the coverage of closed networks to the position. It is important to mention that in the trivial case where there is only one network available to change, by definition that is the best and worst network at the same time. Therefore, the decision-making algorithm can only evaluate one possibility and it would be the only option to select. For these reasons, in this article an environment with at least two or more adjacent networks is considered.
4. Proactive Cross-Layer Framework Design Based on Artificial Intelligence Techniques for Seamless Handover in Mobile WLAN Environments
This section explains the design of the test environment inthe proactive cross-layer framework. It is assumed that the MN is in movement, and it is in an area fully covered with multiple WLAN networks as shown in
Figure 2. The main objective of the protocol is to forecast the next network the MN will connect to. In this scenario, eight access points were placed on a path as shown in
Figure 2.
4.1. Network Selection Process according to the Proactive Cross-Layer Protocol
The network selection process is based on the past, present, and future data modeled by the following block diagram of
Figure 3.
In block A, the historical data is used to know where the mobile device has been physically connected. That is, it uses the GPS data set and GPS and .
In block B, the current data is sampled in order to analyze the current state of the system and make a compensation if it is necessary. That is, it uses the Kalman filter that was proposed by Rudolf E. Kalman in 1960, where a set of mathematical equations efficiently estimates the state of a linear system that minimizes the estimated error to achieve the optimization. The state of vector
x is defined by the set of data [p_x,v_x,a_x,p_y,v_y,a_y], which describes the movements of the mobile in time
t where p_x,v_x,a_x corresponds to the latitude point, velocity, and acceleration, andp_y,v_y,a_y corresponds to the parameter length of a mobile. For the measure of the change in the state of the mobile without Δ
t, we use the kinetic equation of movement (Kalman, 1960). This block is supported by the technique using the AV data set. Next, it uses Equation (1), to draw a possible route and anticipate a possible network change.
where:
is the initial state of the vehicle, v is the velocity, and a is the acceleration of the vehicle (assuming constant acceleration at time t).
In block C, a forecast of a possible future location is calculated and, using classification artificial intelligence techniques, it is predicted which the possible network is that will be reached. In this block, a bayesian classifier is used.
4.1.1. Model of Naive Bayes
Naive bayes is a model for classification based on bayes theorem. This model estimates the per-class probability by assuming that the attributes are conditionally independent. Let
denote defined class variable, in this case
is WLAN network,
denote conditional attribute I used for classification, and
denote conditional probability of class
, given that features of
GPSt and
AVt have happened. The probability model for naive bayes classifier can be defined as shown in Equation (3).
In case of independence, assumption was assumed instead of computing the conditional probability for every combination of attributes
given class
C. We can derive the Equation (3) as follows:
where:
are the available WLAN networks to change.
, …, are the features of GPSt and AVt.
From a classification point of view, Equations (3) and (4) are also called the likelihood functions when they are expressed as a function of
C given
E. For class variables
C1, …,
Cm we can classify the evidence into m value of likelihood. Then, we assign the evidence to the class with maximum likelihood. When we compare likelihood function, we can filter out
from Equation (4) because it is constant. Therefore, let us rewrite the likelihood function as follows:
Equation (5) is more practical because it does not require a very large training set. The reason is evidence was partitioned into multiple attributes by independence assumption. In other words, it allows the class conditional densities to be calculated separately for each attribute and reduces a multidimensional task to a number of one-dimensional tasks. Therefore, the next network to switch is
NRi.
where:
are the available WLAN networks to change.
NRi is the next network to switch to.
It is important to mention that the present data will become historical but the forecast data will not necessarily be part of the historical data set.
4.1.2. Model of Support-Vector Machines
Support-vector machines (SVM) were developed to solve the classification problem and regression problems. As a powerful computational intelligence theory, SVM developed the foundations in 1995 by Vapnik. Regarding the classification problem, two classes are separated by a hyperplane.
where:
w is the weight matrix.
b is the bias matrix.
x is feature input.
The set of vectors is said to be optimally separated by the hyperplane if it is separated without error and the distance between the closest vectors to the hyper-plane is maximal. Therefore, Equation (8) is as follows.
4.1.3. Model of Multinomial Logistic Regression
A model of classification logistic regression is used for data in which the dependent variable is unordered or polytomous, and independent variables are continuous or categorical predictors. Under a multinomial logistic regression model, the probability that
x belongs to class
i is written as:
For , where:
is the weight vector corresponding to class
i, in this case the class is a network and the superscript
T denotes matrix transpose. For binary problems (
m = 2), this is known as a logistic regression model; for
m > 2, the usual designation is multinomial logistic regression (or soft-max in the neural networks literature) because of the normalization condition.
The weight vector for one of the classes need not be estimated. Without loss of generality, we thus set , and the only parameters to be learned are the weight vectors for .
4.1.4. Model of Decision Tree
The decision tree algorithm works very well and continuously if the data are discontinuous, even if noise appears. Moreover, it can handle collinearity efficiently and provide excellent prediction explanation. On the other hand, DTs suffer from higher complexity, especially when dealing with complicated datasets, and consequently they may lose valuable information in case of continuous variables.
The decision tree model aims to find the best split node that guarantees high accuracy. The information gain (IG) method seeks to find the most suitable nodes that return the highest information gain, which can be measured using an entropy factor. The entropy factor is used to determine the degree of disorganization in the system. The entropy for the output can be calculated using the following formula:
where
: proportion of samples that belongs to class c for a particular node, and
C is the class, in this case the available WLAN networks to change.
4.1.5. Model of k-Nearest Neighbors
The k-nearest neighbors (KNN) model is one of the simplest and most straightforward supervised learning models in machine learning. The key idea of this algorithm is to decide a predicted value based on the labeled data points of the training set that are near the query data point.
KNN starts by loading the training data points in memory. Then, the classification task is completed by finding the nearest k data points. Finally, a vote of the k closest points to the query point will determine the class of the query data point. One critical decision that needs to be made is the selection of the distance function. Several distance functions have been proposed to compute the distance between two data points; however, the most common methods are cosine similarity and Euclidean distance.
The Euclidean distance can be calculated by subtracting the training data point from the point to be classified, as in Equation (12).
In this calculation, we determine the data point class based on the best posterior probability value. Although KNN contains a limited number of hyper-parameters (i.e., the k-value and distance function), which makes it a simple model, the K-value can dramatically affect the model performance.
Five classification models were compared to each other to get the best fit for the data set on the network. They are the support-vector machines (SVM), naive bayes (NB), logistic regression (LR), decision tree (DT), and k-nearest neighbors (KNN).
4.2. System Architecture
The system uses native APIs operating system android, WiFi, location, and sensors, such as native supplicants, to obtain the data and store it in a separated file, as illustrated in
Figure 4.
4.3. Decision Taking Procedure
The system takes a sample of the position, acceleration, and available networks every second. Then, based on the data set, it predicts which the next network is, as illustrated in
Figure 5.
To collect the data, an application for mobile devices is used. Then the data is exported to a CSV file and processed with a MATLAB script. Finally, the next network is forecast.
4.4. Data Acquisition
The data was collected using an application for android mobile devices programmed in android studio 3.5.2. Basically, the application scans the network environment, the GPS position, speed, and acceleration every 5 s and stores it in a CSV file extension. The user interface of the application is very simple because it only shows the results every second, as illustrated in
Figure 6.
Next, the data is stored in an archive csv root file in the internal storage of the mobile device. The organization of the resulting file is by columns: latitude, longitude, height, velocity, acceleration x axis, y axis and z axis, best ESSID, RSSI, mac address, and other available networks following the same sequence.
This application allows viewing the proximity of the access point in four dimensions. The longitude, latitude, and height dimensions are represented on the
x,
y, and
z axis, respectively. The fourth dimension is the power of the received signal and is represented by a colored bar. Below are different views of the graph obtained for a single access point. Please see
Figure 7.
Note that closer to the access point the power increases, and further from it, the power decreases. This application for mobile devices can also be used to map the power of the received signal.
4.5. Implementation and Hyper-Parameters
The software is programmed in python3.8 using the sklearn API and IDE Spyder by Anaconda, running on omen HP laptop, whit Intel Core i7 7th gen, 16 GB of memory RAM, and NVIDIA GeForce GTX 1050 4GB. The KNN classifier uses 8 neighbors, for the naive bayes classifier using Gaussian naive bayes, for the decision tree, for the support-vector machines, and for the logistic regression classifiers using default parameters. The hyper-parameters set to KNN is Euclidean distance and k = 8, the other classifiers set default hyper-parameters without any hyper-parameter tuning.
Table 2 shows the result from instruction “get_params”, whichreturns all hyper-parameters from each model.
5. Results
In the scenario of
Figure 8, one hundred samples were taken per access point and then 50% were taken for training and the rest for testing to use cross validation.
Figure 8 graphs the data that was used for training, and the data that was used for testing. In addition, the data that was correctly classified is very close to the access point to which it belonged. Therefore, the accuracy must be very high. In the following, graph samples are taken very far from the access point and the accuracy of the classifier decreases drastically, as shown in
Figure 8.
This is because there is a balance between the train data and the test data.
Figure 9 shows the data obtained with a graph between the accuracy of the algorithm, and
Table 3 shows the percentage of data used to test the different algorithms.
Figure 9 and
Table 3 show the relationship between the test data and accuracy. We observe from the previous graph that the algorithm that best fits the data is the decision tree, followed by naive bayes, k-nearest neighbors, logistic regression, and finally the support-vector machines.
In
Figure 10, the graph grouped by classifier is shown and the results obtained from the least amount of data to the greatest amount of test data are also shown from right to left—purple = 20%, orange = 30%, red = 40%, and blue = 50% data test. As can be seen, the greater the amount of test data, the accuracy decreases, and in the center where the data is equally proportional, the accuracy is similar.
6. Discussion
This section describes the discussion of handover decision, whichis divided into two parts: (A) The mobility forecast part, in which usually the articles consulted use techniques for estimating the position of a SL, and (B) The part of classification techniques to determine which characteristics belong to a class.
With regard to mobility prognosis, the techniques used in this thesis are Kalman filter (KF), particle filter (PF), and artificial neural networks (ANN).
The specialized literature usually uses different filters to estimate the position of an NM as in (Yan, 2019), which proposes a CSI scheme (Canal State Information) contrasting its results with a Kalman filter of [
33], please see
Figure 11. This proposal obtains an error improvement in the
x axis of 0.63 m and in the
y axis of 0.75 m, considering that the node moves in a margin in the
x axis of 100 m and on the
y-axis of 8 m. An error of 0.63% on the
x axis and 9.37% on the
y axis is estimated.
In contrast to the proposal presented in this paper, which was tested with a movement of 2000 m on the x-axis and 40 m on the y-axis, it yielded an average error of 33.58% in the artificial neural network and 45.74% in the filter of particles. Considering that it is a random movement and that in theory it does not have any movement pattern, it was decided to define a test scenario as in all the articles consulted in this research. In this thesis, a test scenario with uniformly accelerated movement was defined in a circuit with seven access points.
Regarding the classification techniques used, five classifiers with seven input characteristics and one output with seven possible networks are proposed.
Figure 12 shows a set of graphs that are contrasted with each other, that is, in the first row latitude is plotted on the
y-axis against the remaining characteristics in order to observe their distribution and to know if they are linearly separable. On the other hand, if acceleration versus velocity is plotted, it can be seen that the points are very close; therefore, a classifier with these characteristics would not perform well.
Figure 12 diagonally shows the distribution of the eight variables of our data set. In other cells of the plot matrix, we have correlation plots of each variable combination of our data frame. In the first row and second row we can see latitude on ordinate and longitude on abscissa, and the right graph shows the correlation between latitude on ordinate and altitude on abscissa and then each combination to other variables.
Support-vector machines work very well with a lot of data because they can have more support-vectors; however, in two case studies the classifier with similar behavior in percentage accuracy is the naive bayes classifier with an approximate accuracy of 80%. For this reason we recommend the naive bayes classifier as the best option because the other classifiers vary depending on the amount of data, data distribution, or imbalanced data.
Figure 13 shows the confusion matrix results.
The confusion matrices (B), (D), and(E) on
Figure 13 have similar behaviors, which indicates that the NB, KNN, and DT models fit better than LR and SVM to the data set.
The time to train and test models is presented in
Table 4. NB, KNN, and DT have similar behavior, while SVM and LR take alonger time to fit the model to the data set.
Other proposals in the literature have been developed with similar results as shown in
Table 5, in similar scenarios, but this approach is part of an application for mobile devices that will be added in the future with geo-fences to delimit the behavior of the mobile node.
7. Conclusions and Future Work
This section describes the conclusions that have been obtained from this research. With respect to the techniques estimating the next access point (AP) being 92% average accuracy, while the next AP is correct, this implies a seamless handover without data lost. An exponential increase in the acquisition time of an IP address was also observed depending on the signal strength. Therefore, performing a jump with a power greater than −40 db or 60% of quality of link is recommended. As future work, the implementation of this algorithm in a streaming data, or voice data call, to measure the quality of experience will be applied.