ML-Based Traffic Classification in an SDN-Enabled Cloud Environment

Belkadi, Omayma; Vulpe, Alexandru; Laaziz, Yassin; Halunga, Simona

doi:10.3390/electronics12020269

Open AccessArticle

ML-Based Traffic Classification in an SDN-Enabled Cloud Environment

¹

National School of Applied Sciences Tangier, LabTIC, Abdelmalek Essaadi University, Tetouan 93002, Morocco

²

Telecommunications Department, University Politehnica of Bucharest, 060042 Bucharest, Romania

³

R&D Department, Beam Innovation SRL, 041386 Bucharest, Romania

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(2), 269; https://doi.org/10.3390/electronics12020269

Submission received: 11 November 2022 / Revised: 15 December 2022 / Accepted: 28 December 2022 / Published: 5 January 2023

(This article belongs to the Special Issue Application of Artificial Intelligence in the New Era of Communication Networks)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Traffic classification plays an essential role in network security and management; therefore, studying traffic in emerging technologies can be useful in many ways. It can lead to troubleshooting problems, prioritizing specific traffic to provide better performance, detecting anomalies at an early stage, etc. In this work, we aim to propose an efficient machine learning method for traffic classification in an SDN/cloud platform. Traffic classification in SDN allows the management of flows by taking the application’s requirements into consideration, which leads to improved QoS. After our tests were implemented in a cloud/SDN environment, the method that we proposed showed that the supervised algorithms used (Naive Bayes, SVM (SMO), Random Forest, C4.5 (J48)) gave promising results of up to 97% when using the studied features and over 95% when using the generated features.

Keywords:

traffic classification; machine learning; SDN; cloud computing

1. Introduction

Nowadays, we can see a drastic increase in access to internet applications as people rely more and more on the internet. Consequently, there is a significant demand for good management and classification of network traffic [1]. Our research falls into the area of network traffic classification, which has been extremely active for more than a decade. The applications of traffic analysis range from security and anomaly detection to network management and traffic engineering [2]. The uses of the internet have undergone several changes in recent years; starting as a network for the simple transfer of binary and textual data without constraints of time or speed, the internet experienced its first revolution with the appearance and democratization of the web and a demand for more and more important bandwidth. Nowadays, with the appearance of streaming applications, such as real-time video transmission, IP telephony, and internet television, its use obliges even stronger constraints. The internet must, therefore, be able to provide users and their applications with the quality of service (QoS) they need. It must evolve from the offering of a single “best effort” service to a multi-service offering. Over the last few years, QoS has emerged as a major issue in the internet [3].

Furthermore, the increase in the complexity of the internet and its numerous interconnections, the heterogeneity of its resources in terms of technologies, but also in terms of dimensions, and the characteristics of its traffic, as well as new applications with diverse and evolving needs, add many characteristics to internet traffic that are far from the traditional ones. In particular, it has been shown that applications used to exchange large data, for example, change the distribution of file sizes [3], which has the particularity of creating long dependency properties and self-similarity in traffic. The latter translate into high traffic variability, which affects the stability and quality of services offered, decreases overall network performance, and, thus, results in QoS/performance degradation. The improvement of internet architectures and protocols (e.g., software-defined networks—SDN, an emerging technology for solving network complexity, by basically separating the data and control planes to allow better management and innovation) is closely linked to the knowledge and understanding of the characteristics of internet traffic because they indicate the types of mechanisms to be deployed to adequately address user needs and network constraints. Consequently, the development of tools based on intelligent metrology, technologies allowing the collection of information in internet traffic, and methods allowing the analysis and classification of its characteristics by using, for example, machine learning are very important subjects for the fields of network engineering and research. Accurate traffic classification, especially for real-time traffic, is very important for not only network management, but also for intrusion detection and security monitoring in order to discover abnormal behaviors and prevent data breaches.

There are multiple approaches to traffic classification [1], as shown in Figure 1, such as the port-based method, which utilizes the port numbers that an application uses to match traffic to its corresponding source. Even though it was previously the quickest and most effective method, the latter is used less with all of the current changes in the field. In addition to that, there is the payload-based technique. It uses measurable packets’ payload properties, but it faces many restrictions, such as encryption and confidentiality issues. The last method of classification in the schema, which is based on flow statistics combined with machine learning, is the most promising technique. This is the approach used in our work.

Machine learning (ML) is a field of artificial intelligence and a scientific discipline that covers several areas of study: mathematics, statistics, and algorithms [1]. The diversity of these anchors contributes most certainly to the success of this discipline, whose work has been characterized by great creativity in the design of algorithmic methods, the search for a solid mathematical foundation to support the methods developed, and attention to natural mechanisms of learning and generalization.

A great deal of attention has been given to the natural fields of applications, as evidenced by the exponential growth of learning in areas such as data mining and pattern recognition. Their aim is to improve the performance of a machine when performing a task by using a set of exercises. The multiple ways of expressing this triplet (performance, task, exercise) make it possible to develop many theoretical frameworks for a field, including that of statistical learning, to develop different models by using generative or discriminant examples, and to apply them to various tasks, such as chess, driving, and classification.

Machine learning algorithms can be categorized according to the type of learning that they employ:

Supervised learning includes the tasks of classification, regression, and ranking. It usually involves dealing with a prediction problem. Supervised classification consists of analyzing new data and assigning them to a predefined class according to their characteristics or attributes. The algorithms are based on decision trees, neural networks, the Bayes rule, and k-nearest neighbors. Moreover, an expert is employed to correctly label examples. The learning must then find or approximate the function that allows the assignment of the correct label to these examples. Linear discriminant analysis and the support vector machine (SVM) are typical examples.
Unsupervised learning (clustering, segmentation) differs in that there are no predefined classes; the objective is to group records that appear to be similar into the same class. The problem is that of finding homogeneous groups in a population. Techniques for aggregation around mobile centers or hierarchical bottom-up classification are often used. The essential difficulty in this type of construction is validation. No experts are required. The algorithm must discover the structure of data by itself. Clustering and Gaussian mixtures are examples of unsupervised learning algorithms.

The main contribution of this paper is the study of the classification of cloud network traffic managed by an SDN by using machine learning techniques. There is a rising number of papers working separately on traffic classification in SDNs or cloud traffic, but as our proposed integration of the two environments (an SDN controller and cloud platform) has not been discussed before, our goal in this work is to tackle this traffic and study it for classification within our deployed testbed’s real traffic. The generated traffic flows belonged to a Linux virtual machine that was installed and run from a cloud platform whose network was managed by the OpenDaylight SDN controller. We conducted controlled experiments to build models for four different supervised ML algorithms (Naive Bayes, SVM, Random Forest, and C4.5) with the aim of having the most accurate model for classifying an application’s traffic. These experiments consisted of the separate use of two applications (YouTube and Facebook), where the user interacted in each case to generate traffic flows, enabling the knowledge of what application was actually in use. The data collected in each experiment were considered as training data, and test data were randomly collected to test the trained modules. Moreover, we showed how the selection of features could affect the final results by using two different sets in this paper.

This paper is organized as follows. Section 2 describes related work on traffic classification in SDN and cloud environments and the latest trends and challenges, while Section 3 details the process of collecting traffic data and selecting traffic features for the experiments. Section 4 describes the methodology of the training and classification process, as well as the experimental design, while in Section 5, the results are analyzed and discussed in more detail. Finally, Section 6 concludes the paper.

2. Related Work

The literature on traffic classification alone is extremely rich; however, the methods proposed in the papers that we reviewed vary from one to another, while the common fact is that they all combine more than one algorithm to achieve better classification and accuracy. We observed that the number of articles related to this theme was limited, and they were already summarized in a previous survey [1] (August 2018). Then, in a more recent study, the authors of [4] (September 2022) updated the survey.

For instance, in [5], the authors used ML algorithms to classify network traffic by application, similarly to the approach in this study, but in a traditional network. By using data labeled by application, they trained some ML models that were recognized by the classifier, including Skype, Post Office Protocol 3 (POP3), the Domain Name System (DNS), Torrent, and Telnet. They tested the following six different classification models and compared their accuracy: Support Vector Machine (SVM), Random Forest, C4.5, AdaBoost, MLP, and Radial Biased Function (RBF).

At this point, we can also say that few papers have discussed traffic classification in SDNs. Papers mentioning the SDN controller OpenDaylight were mainly focused on security and anomaly detection, rather than classifying traffic.

However, in [6], the authors used both statistical and deep packet inspection (DPI)-based approaches in their classification model. They assumed that using DPI techniques and a machine learning classifier would help achieve a higher accuracy. However, that did not change the fact that DPI techniques require more calculation time and face encryption issues. In this paper, they did not mention what SDN controller they used, and they only mentioned that their classifier was implemented in the control plane.

On the other hand, the authors of [7] claimed to propose a new method for classifying traffic in an SDN by using four different neural network algorithms: feedforward, multilayer perceptron (MLP), nonlinear autoregressive exogenous multilayer perceptron (NARX (Levenberg–Marquardt)), and Naive Bayes. They also mentioned using the OpenDaylight controller with real-time campus traffic to test their model (but this was not a cloud-based environment), and they used 13 features in their training set. However, the last feature that they used (AppPRO: protocol of the application layer) was said to be added manually, and the utility and values of this were not clear, nor if it affected the final results.

In [2], the authors tackled traffic classification in an SDN by using a network application to collect traffic flows—exclusively transmission control protocol (TCP) traffic. Several packets of information were gathered by using different methods, such as Packet IN messages, to extract source/destination IP addresses and port addresses. Then, they used the controller to collect the first five packet sizes and timestamps as the next five packets after the initial handshake between the server and client flow went through the controller. The flow duration was calculated by subtracting the timestamp of the initial packet and the timestamp of the message received by the controller regarding the removal of the flow entry. So, overall, they used 12 features, which were slightly similar to the features studied in this work; then, they compared the accuracies of three ML algorithms’ models, which were lower than those in our tests: Random Forest (highest 96%), Stochastic Gradient Boost (highest 93%), and Extreme Gradient Boost (highest 95%).

In [8], Eom, Won-Ju et al. proposed a machine learning network classification system that used a software-defined (SDN only) technique that was implemented at the network controller level (RYU). The system was used for traffic classification, and the overall system performance significantly improved in terms of some standard QoS parameters, such as accuracy, precision, and training and classification time, in comparison with other classifiers under the same testing conditions. However, the highest accuracy reached was 96.15% when using the LightGBM algorithm model.

In [9], based on the POX controller and mininet network emulator, the authors studied traffic classification in an SDN by using three supervised learning algorithms: SVM, Naive Bayes, and the nearest centroid. They classified various applications based on the number of collected flow instances for the first five minutes and for ten minutes. The highest obtained accuracy was 96.79% with Naive Bayes, but they basically used the ICMP ping traffic generated with mininet.

Furthermore, multiple recent papers studied traffic in complex environments. In [10], the authors performed a systematic review of the use of artificial intelligence (AI) in SDNs, outlining the latest achievements in the field. The authors of [11] presented the results obtained by applying a deep learning recurrent algorithm for classifying traffic over cloud-implemented Internet of Things (IoT) systems, and they came to the conclusion that the results outperformed those of other similar implementations with respect to latency, throughput, and transmission rate. In [12], the authors defined, implemented, and evaluated the performance of a deep learning SDN used for the detection of several security threats in an IoT environment, and they validated the algorithm by using the CICIDS2018 dataset, showing that the proposed algorithm achieved better results with respect to accuracy, precision, speed, and other evaluation metrics, while the authors of [13] highlighted the effectiveness of applying artificial intelligence models as an answer to the security issues raised by the development of IoT, namely, the neural network. Indeed, a neural network is applicable in the analysis of intrusions. In addition, in this work, there was a further use of genetic algorithms, which allowed the improvement of the performance of the neural network through the optimization of its parameters, which was achieved through a trial-and-error approach. In [14], the authors presented a comprehensive overview of a large number of classification metrics and compared them by means of standard QoS metrics, such as speed, accuracy, recall, etc.

In [15], the authors proposed traffic classification by using supervised machine learning techniques in an SDN-enhanced FiWi–IoT environment. The tests showed the Random Forest algorithm to be the best classifier (with an accuracy of 99%), followed by the KNN Tree, Neural Network, Naïve Bayes, Logistic Regression, and SVM, as in our findings. Different data were captured from IoT and non-IoT devices as PCAP files, similarly to in our procedure, but the authors did not list the exact list of features used. The authors of [16] presented a framework based on SDN by using a machine learning classification model. This was intended to detect and mitigate DDoS attacks for IoT devices. A new approach was proposed in [17] to detect DDoS attacks in SDNs by employing six classifiers (SVMs, Random Forests, and Gradient-Boosted Machines) by using an optimal set of weights. They developed a testbed with mininet and a POX controller loaded with different datasets, and the accuracy of their tests went up to 99%. On the other hand, in [18], supervised machine learning models were also used to detect DDoS attacks. This time, the authors only compared two algorithms—SVM (showed 80% accuracy) and Naive Bayes (76%)—and they used mininet and an RYU controller for their testbed.

Furthermore, the authors of [19] proposed an integrated approach to benchmarking and profiling vertical industrial applications with regard to resource utilization, capacity limits, and reliability characteristics. This was based on data analysis to extract information, as well as the benchmarking of experiments, which would be used later in the development of artificial intelligence algorithms. This was all done with the purpose of optimizing the use of the infrastructure, providing a guarantee of the quality of the service (QoS) presented, and, obviously, supporting Industry 4.0 in its evolution. The authors of [20] suggested a mechanism for addressing the problem of optimal resource allocation by profiling services and predicting the resources that guaranteed the desired service quality. To do so, the approach adopted was based on the collection of data from the hardware and the virtual and service infrastructure, the study of well-known implementations based on containers according to the microservice paradigm, and the use of machine learning algorithms for the prediction of resources required for service requests. Moreover, the authors of [21] presented a subsystem of the METRO-HAUL tool that enabled network resource optimization according to two different perspectives: offline network design and online resource allocation. Indeed, the offline network design algorithms aimed at planning the resource allocation, while the online resource allocation took the different requirements of end-user services in terms of bandwidth, delay, QoS, and the set of VNFs into account.

The motivation behind our study was the development of an application for traffic classification within our implemented SDN/cloud environment, which is where the novelty resides. So, the first step presented in this paper consisted of demonstrating how we could use the traditional way of classifying traffic by using supervised ML algorithms. Thus, unlike in the studied papers, we based our study, on the one hand, on controlled experiments in order to concretely label training traffic data and have robust models with our set of algorithms. On the other hand, our study was completed by comparing two sets of different features in a search for better results, as the choice of features is a very important parameter in classification.

3. Traffic Data

3.1. Dataset

During data collection, we used our integrated platforms of OpenDaylight and OpenNebula [22]; then, we created an Ubuntu 16.06 desktop virtual machine within OpenNebula, as shown in Figure 2.

Unlike in previous work, we rather manually collected our dataset by running one type of experiment for five minutes or more in parallel with the separate collection of the full transmitted and received packets, traces, and logs for each experiment over the network by using the tcpdump tool [23] (version 4.3.0) at the level of the command line, and we used a script that simultaneously executed the lsof command every second. The lsof command helped to list open connections by providing the program names and their source/destination addresses and ports.

The goal of running controlled experiments was to collect network traces in which we knew exactly what application was used. With that, we would be able to accurately label our training dataset, which explains our small training set for this demonstration. However, we intended to use a slightly larger dataset to test the classifiers. Moreover, as we studied traffic in our implemented environment, which was recently discussed in our previous publication [22], this made our studied classification a more specific use case, as we could not find large public databases for this particular network traffic.

Here, we describe both the setup and data collection for the controlled experiments, giving details about the different types that we included to build our ground truth in addition to each application run. We conducted experiments with two popular applications (Facebook and YouTube). At the start of each experiment, we first opened the application, logged into it, and then simulated a set of interactions.

Here are details for the use of each application:

Facebook: We opened the Facebook web page on a Chrome browser, and we simulated typical behaviors that a user would have by posting something, liking a publication, having a chat conversation (sending and receiving messages), seeing the last news feed, and then visiting a random page.
YouTube: We opened this on a Chrome browser, and then we chose a video of five minutes to watch.

To summarize the data collected in our experiments, in Table 1, we have provided the metrics of one run of each experiment and its duration, packet count, and total size in bytes.

The first two experiments consisted of collecting the traffic data of each application separately and making sure that no other applications were in use on the network other than the browser. In this way, we were able to label the flows. Then, a third experiment was run by using both applications, and the obtained dataset was used for classification.

We used the output of the lsof command to label the experiment packets with their corresponding applications. However, to extract the flow information, we worked directly on the PCAP files (Figure 3) by using tcptrace [24]; then, we labeled each flow with its application by using the lsof output. At the end of the experiment, we obtained two files: a PCAP one containing the packet information and another containing the output of the lsof command. From the PCAP file, we extracted the packets’ time, source address, source port, destination address, destination port, protocol, and length in bytes.

From the lsof output (as shown in Figure 4), we extracted the application name, its source address, source port, destination address, destination port, and protocol.

By using a script, we parsed the packet information file with the lsof output to get each application packet in a separate file; then, we extracted and calculated the selected features, which are discussed in the next section.

3.2. Features Selection

After building the ground truth of our experiments, we needed to find features that we could extract from packet headers and that would allow us to distinguish the traffic of user activity from the rest.

Even though we had the full packets of our experiments, we worked only on packet headers because, in the online case, we would not have access to the full packets, as it is known that deep packet inspection that includes packet payloads is an issue for many researchers [25]. One cannot access this due to encryption or privacy, and it is even illegal in some cases. For now, this work studied the offline case with packet headers, and in this section, we classify features from previous work; then, we present the ones that are helpful for us.

Through the selection of features from our collected data, we had many choices to make to find the most exhaustive ones. Choosing which one to try from the hundreds was very important to us. We could not try all of the possible features, since it would be inefficient to try a feature that we knew would not be useful for detecting a changing behavior, so our choice was logically based on the question of if these features would be related to a user changing their behavior or not.

We categorized many features that were discussed in other papers—for example, in [26], there were more than 250 features, which we summarized by avoiding those based on packet payloads, and we included them in our list. In [3,27,28,29], the authors mentioned features that they worked with, which were not all useful for us, so we took the flow- and packet-based ones. We show the wide variety of those for characterizing flows, including the simple statistics of packets, in eight categories in Table 2.

From all of the features above, the packet and flow-level ones were the most important for us, since we wanted to identify flows generated by interactions with the desired application. There were features from the list that would not help distinguish if a flow was generated by an interaction or not. For example, ports are helpful for application identification [30], but in our case, we could not use them because, in the browser in which we ran our applications, the same port numbers were used, and there would not be a difference in which one is used. The TCP window level and round-trip time (RTT) should not change if users are interacting with the same browser. These features are more about computer performance. For the rest of the categories, we took some features from each. For example, in timing, all of the features sound helpful, but the packet and flow inter-arrival times, packets, bytes, and DNS requests per second, in addition to the duration, are the most exhaustive. In the packet count, the number of flow packets was the only feature that sounds helpful; therefore, we included it in our choice. Another exhaustive feature that we took was the total size of a flow’s packets in the packet size category, and both features were added in the application.

Consuming a connection will definitely have an effect on packets and flows, which means that any interaction with the application will be logically related to a packet or flow changing; for example, there will be more packets received and transmitted while checking the Facebook news feed or during a video call, and since a flow is, by definition, a sequence of packets, this level will be affected too. This is why we divided the nine features that we previously selected into these two levels. Table 3 shows these features and their descriptions.

4. Methodology

Classification is the most studied challenge in the field of ML. Its concept is simple; it gives the right label or class to an item of data according to its characteristics, and many tasks can be reduced. To define a classification problem, one starts by representing the data in the vector space of their characteristics; then, one carries out calculations on this representation of data and assigns a score that can be translated to a certain class, followed by identifying the number of existing classes.

This work presents a set of machine learning algorithms used with a labeled dataset that we trained in order to identify and classify the flows of the studied applications on an SDN/cloud platform. This research aims to give an overview of traffic classification in such an environment. As shown in Figure 5, the generation of the classification training model was performed by using the trained data. In the classification phase, a sample of traffic was used with the chosen features to classify the entire stream.

After the data collection and feature calculation, we generated a file that had a csv extension at the end. This type of file is usually accepted by Weka, but we decided to convert it into the ARFF format, since this is the default input in Weka. In addition, the size of the file decreased considerably compared to that of the file that we obtained with a csv extension. The structure of a file in ARFF format is very simple. ARFF files are divided into three parts: @relation, @attribute, and @data.

@relation <name>: Every ARFF file must start with this declaration on its first line, ideally with the name of the file (one cannot leave blank lines at the beginning). <name> will be a string, and if it contains spaces, it should be enclosed in quotes.
@attribute <name> <data_type>: This section includes a line for each attribute existing in the dataset to indicate its name and the type. With <name>, the name of the attribute will be indicated, and it must begin with a letter; if it contains spaces, it must be enclosed in quotation marks. For <type_of_data>, the type of data for this attribute can be:
-
numeric, meaning that the attribute is a number.
-
an integer, indicating that this attribute contains whole numbers.
-
a string, indicating that this attribute represents text strings.
-
a date [<date-format>]; in <date-format>, we indicate the date format, which is of the type “yyyy-MM-dd’T’HH:mm:ss”.
-
<nominalspecification>. These are data types defined by the user and can take the indicated values. Each of the values must be separated by commas.
@data: This section includes the data themselves. Each element is separated by commas, and all lines must have the same number of elements, which is a number that matches the number of declarations in @attribute, which was added in the previous section. If no data are available, a question mark (?) is written in their place, since this character represents the absence of a value in Weka. The decimal separator must be a period, and strings must be enclosed in single quotes.

There is a large set of classifying algorithms in Weka, but we limited the algorithms that we used to the most widely used algorithms for traffic classification in most of the related work listed in Section 2. Thus, we only made use of the following ones:

Naive Bayes—Naive Bayes is a supervised classification and prediction technique that builds probabilistic models; it is based on Bayes’ theorem and on data independence. It is a supervised technique, since it requires previously classified examples for its operation. In general terms, Bayes’ theorem expresses the probability that an event occurs when it is known that another event also occurs. Bayesian statistics are used to calculate estimates based on prior subjective knowledge.
The implementations of this theorem are adapted with use and allow the combination of data from various sources and their expression in the degree of probability.
SVM—Support Vector Machines and wide-margin separators are a set of supervised learning techniques designed to solve problems of discrimination and regression. SVMs were developed in the 1990s based on the theoretical considerations of Vladimir Vapnik [31] concerning the development of a statistical theory of learning: the Vapnik–Chervonenkis theory. SVMs were quickly adopted for their ability to work with large data sizes and low numbers of hyperparameters due to the fact that they are well founded in theory and their good results in practice.
Wide-margin separators are classifiers that rely on two key ideas to deal with nonlinear discrimination problems and to reformulate classification problems as quadratic optimization problems. The first key idea is the notion of the maximum margin. The margin is the distance between a separation boundary and the nearest samples. The latter are called support vectors.
In order to be able to deal with cases in which the data are not linearly separable, the second key idea of SVMs is to transform the representation space of the input data into a larger (possibly infinite) dimensional space in which a linear separator is likely to exist.
Random Forest—This algorithm is a combination of predictor trees in such a way that each tree depends on the values of a random vector that is tested independently and with the same distribution for each of these.
Random Forest makes use of an aggregation technique developed by Leo Breiman that improves the classification precision by incorporating randomness into the construction of each individual classifier. This randomization is introduced both in the construction of the tree and in the training samples. This classifier, which is simple to train and adjust, is often used.
J48 tree (C4.5)—This is one of the most widely used ML algorithms. Its operation is based on generating a decision tree from the data through partitions that are made recursively according to the depth-first strategy. Before each data partition, the algorithm considers all of the possible tests that can divide the dataset and selects the test that produces the highest information gain.
For each discrete attribute, a test with “n” results was considered, where “n” was the number of possible values that the attribute could take. For each continuous attribute, a binary test was performed on each of the values that the attribute took from the data.

Furthermore, to take a deep dive into our results, we ran the same tests for a second iteration by changing the features that we selected. This time, we used the Netmate [32] tool, which generated a set of default features provided by the tool [33]; then, we used this set of features as the flow attributes on which to base the classification with the same four algorithms. Figure 6 illustrates the steps of our methodology.

The captured traffic data remained the same; then, they were duplicated to run the same experiment while following the same training and classification processes as those explained in the beginning of this section, but with different sets of features. The first set contained the features that we selected from the studies of the literature, for which we developed a script to calculate each metric of the nine features. The second set was generated by Netmate with a total of 44 features. The results of our approach will be presented and discussed in the next section.

5. Results and Discussion

This section describes the results of the experiments that were carried out to show which model was the best for classifying our traffic data. As mentioned in the previous sections, the results presented for our experiments were based on a training set of evaluations of labeled data that were used to build the classification model for each algorithm used; then, they were loaded into the Weka classification as a training set. It should be noted that the percentage of success of each algorithm varied depending on the type of evaluation that was performed. Typically, when we used a training file as an evaluation method, we obtained a higher precision in the percentage of hits than when we used cross-validation, because through cross-validation, a part of the dataset was used to train and another was used for testing, so we could obtain a lower percentage of correctly classified data.

For the following results, we used the default settings of Weka for each algorithm both to build the trained model and for the classification tests with the different feature cases. Table 4 lists the two sets of features used.

All classifiers were tested with the trained data. Figure 7 shows the results for the accuracy of the different algorithms that were tested on the first set of features under study based on Equation (1):

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

(1)

where:

TP: true positive—reflects the average number of samples that the model correctly classified from the positive class.
TN: true negative—reflects the average number of samples that the model correctly classified from the negative class.
FP: false positive—reflects the average number of samples that the model incorrectly classified from the positive class when, in fact, it was from the negative class.
FN: false negative—reflects the average number of samples that the model incorrectly classified from the negative class when, in fact, it was from positive class.

After running these tests for each of the selected algorithms, we believe that the best model was the one obtained with the Random Forest as the classifier, since it gave a very good percentage of accuracy (∼97%) compared to those of the other algorithms, followed by Naive Bayes and C4.5 (∼82%), and then the SVM.

There were two algorithms in Weka that we made use of: SMO (79.49%) and LibSVM (76.92%); the first one was developed within the tool and was automatically available in the UI; its accuracy results were around 79%. It used J. Platt’s sequential minimal optimization algorithm [34] with the RBF kernel by default. LibSVM was developed separately from the platform by Chih-Chung Chang and Chih-Jen Lin [35], and it is simply integrated with a .jar call. It is a wrapper class for the libsvm library [36] within the tool. This showed a slightly lower accuracy (of ∼77%) compared to that of SMO.

SVM algorithms’ results can vary and can be affected by the implementation and settings of the kernel configuration used, the dataset size, and the features used. Even though the results were expected to be similar when using the SVM method with the same dataset and features in the first iteration, the observed difference was mainly explained by the default settings of the two algorithms within the tool. Still, they seemed to have the lowest accuracies in our tests with the set of studied features compared to the other classifiers.

Otherwise, in Figure 8, we show the accuracy of the same four algorithms on the second set of features in our experiments. As we can see, Random Forest still showed the best classification results for our data.

However, it slightly decreased compared to the first results, so we can say that it showed better classification with our selected features. However, for the remaining three algorithms, it was obvious that they performed a better classification with the Netmate features, as they showed an increased accuracy. In particular, C4.5 jumped to 92.86% (more than a 10% difference) and the SVM jumped to 85.29% (a difference of 6%).

In terms of the average recall and precision of the classification algorithms, Figure 9 and Figure 10 confirm the Random Forest’s high rates for correctly classifying the applications’ different flows, with the highest precision and recall compared to those of the other algorithms with both the studied and Netmate features.

We also observed that SMO had a good precision of 0.846 with the studied features and 0.843 with the Netmate features, but the poor recall for the same algorithm indicated the large number of false negatives in the case of the studied features, which explained the low overall accuracy of the classifier when using this set of features compared to the other. We found that SMO showed better results with a bigger number of features, as there were four times as many Netmate features as there were features developed in this study.

On the other hand, we found that Naive Bayes and C4.5 (J48 in Weka) showed exactly the same average recall when classifying our data and a small difference in precision when it came to the studied features. However, with the Netmate features, C4.5 showed a higher precision and recall—over 0.9—which explained the increased accuracy of the classifier with this set of features. The same went for Naive Bayes, with a smaller difference in the metrics between the use of the two different sets of features.

Classifiers can also be compared according to their ability to individually classify matrix classes; in our case, this would be the flows belonging to each application and their exact precision and recall based on Equations (2) and (3):

P r e c i s i o n = \frac{T P}{T P + F P}

(2)

R e c a l l = \frac{T P}{T P + F N}

(3)

To look further at the given results, in Table 5, we present these metrics for each classifier for the Facebook and YouTube flows regarding the two features sets (S_f refers to the studied features, while N_f refers to Netmate’s features).

Precision gave us the number of correctly classified traffic flows for each application over all of the existing flows for that specific application. For instance, we can see that all Facebook flows were correctly classified by the Random Forest algorithm as Facebook flows, while SMO showed the lowest precision other than that of Naive Bayes in the case of the studied features. Nevertheless, the three algorithms showed better classification with the Netmate features, except for Random Forest. However, in the case of YouTube, Random Forest kept showing the best results in comparison with the rest of the classifiers when using the studied and Netmate features, and it was ranked before SMO with 0.833 (versus Netmate’s 0.806) and Naive Bayes with 0.8 (versus Netmate’s 0.781).

Moreover, we had the recall metric, which showed the number of correctly classified flows over all of the two applications flows. For example, in the case of the YouTube application, all flows were correctly classified with the Random Forest algorithm when using the studied features. The poorest recall went to the SVM algorithm (SMO) when using the same set of features. Nevertheless, Facebook flows’ recall showed more persistent results for the four algorithms when using studies features, but SMO and Naive Bayes showed less of an ability to correctly classify the Facebook flows when using the Netmate features.

To sum up the obtained results, we can say that Random Forest showed the best results for both applications with both feature sets, but the results were more significant with our studied and selected features, which gave the highest accuracy. The accuracy obtained in our tests was as efficient as that stated in previous papers on SDN traffic classification [2,6,7,8,9]. However, the remaining three algorithms—SVM, Naive Bayes, and C4.5—were able to better classify the network flows of our dataset when using the Netmate features. In the experiments in [5], Random Forest and C4.5 were also the classifiers that gave the best accuracy, while SVM did not show good results when using Weka. In addition, the authors of [7] claimed that of all of the proposed methods for classifying traffic in SDNs, theirs gave the highest accuracy (97%), similarly to the results in this paper with the Random Forest algorithm (97.44%). In [6], the authors included a study about the appropriate number of features to use for better results, and they concluded that the number should be equal to or higher than eight features, which was the case in our experiments. The Netmate features (44 in total) were four times more numerous than those that we selected (nine features). Most papers using supervised learning have stated the use of the Netmate features for their classification studies, but did not compare them with other sets of features. Their comparisons were only based on the chosen algorithms’ metrics (i.e., accuracy, recall, precision, etc.).

Overall, this work’s originality can be reduced to three main pillars:

First: An undiscussed platform setup that included two known platforms of the SDN/cloud fields (OpenDaylight and OpenNebula) was studied. As the integration of the cloud and an SDN is a great facilitator when it comes to moving towards automation, artificial intelligence and machine learning techniques were used to ensure that all digitization initiatives were integrated in a coherent way. This was all with the ultimate goal of delivering the best end-user experience and quality via traffic classification mechanisms.
Second: An accurate training data was created, where we knew exactly which traffic was generated by each application, and this was then used to train our classifiers.
Third: Not only were four popular algorithms that are commonly used in related work compared, but a further comparison based on two sets of features was also included. Therefore, the increased performance observed for classifying application flows in our tests was basically related to the set of features, as the whole configuration remained unchangeable, except for the change in the feature set in each iteration.

6. Conclusions

In this paper, we studied traffic classification in an SDN/cloud environment by using the approach of supervised learning through different four algorithms. We discussed the methodology used based on the Weka tool and how the testbed was deployed by using two sets of features. We proved how efficient these algorithms could be in correctly classifying the network traffic in such an environment and how the choice of features impacted the results, which could be an extended research topic.

This work could have many possible extensions due to the richness of the topic. For example, a larger dataset could be a very good start in order to see how the classifiers’ performance would be. This could be done by using the two mentioned applications (Facebook and YouTube) or by extending the use to more popular applications. Furthermore, Weka provides a significant set of algorithms that could also be customized by changing their settings, which would open up many possible scenarios of tests. Feature selection and its impact on the final results should not be forgotten, as the latter could be extended or changed with the aim of finding the best possible features for enhancing a classifier’s performance for later integration into a platform. Weka also provides many facilities for analyzing these attributes and classifying them before working on the data themselves, which could be of use.

Author Contributions

Conceptualization, S.H. and Y.L.; methodology, A.V.; software, O.B.; validation, O.B. and A.V.; investigation, O.B.; writing—original draft preparation, O.B.; writing—review and editing, A.V. and Y.L.; supervision, S.H.; funding acquisition, A.V. and S.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by a grant from the Ministry of Research and Innovation, UEFISCDI Romania, project no. ERANET-ERAGAS-ICT-AGRI3-FarmSusteinaBl-1.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Acknowledgments

This work was a cooperation between Abdelmalek Essaadi University, Morocco, and University Politehnica of Bucharest (UPB), Romania. The authors are grateful for the support offered by the Agence Universitaire de la Francophonie (AUF) via their Eugene Ionesco scholarship program, which made this joint work possible.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ARFF	Attribute-Relation File Format
DPI	Deep Packet Inspection
ML	Machine Learning
MLP	Multilayer Perceptron
NARX	Nonlinear Autoregressive Exogenous Multilayer Perceptron
PCAP	Packet Capture
POP3	Post Office Protocol 3
QoS	Quality of Service
PCA	Principal Component Analysis
RBF	Radial Biased Function
SDN	Software-Defined Network
SMO	Sequential Minimal Optimization
SVM	Support Vector Machines
UI	User Interface

References

Zander, S.; Nguyen, T.; Armitage, G. Automated traffic classification and application identification using machine learning. In Proceedings of the IEEE Conference on Local Computer Networks 30th Anniversary (LCN’05), Sydney, NSW, Australia, 17 November 2005; pp. 250–257. [Google Scholar] [CrossRef]
Amaral, P.; Dinis, J.; Pinto, P.; Bernardo, L.; Tavares, J.; Mamede, H.S. Machine learning in software defined networks: Data collection and traffic classification. In Proceedings of the IEEE 24th International conference on network protocols (ICNP), Singapore, 8–11 November 2016; pp. 1–5. [Google Scholar] [CrossRef]
Nguyen, T.T.; Armitage, G. A survey of techniques for internet traffic classification using machine learning. IEEE Commun. Surv. Tutor. 2008, 10, 56–76. [Google Scholar] [CrossRef]
Azab, A.; Khasawneh, M.; Alrabaee, S.; Raymond, C.K.K.; Sarsour, M. Network traffic classification: Techniques, datasets, and challenges. Digit. Commun. Netw. 2022. [Google Scholar] [CrossRef]
Jaiswal, R.C.; Lokhande, S.D. Machine learning based internet traffic recognition with statistical approach. In Proceedings of the 2013 Annual IEEE India Conference (INDICON), Mumbai, India, 13–15 December 2013; pp. 1–6. [Google Scholar] [CrossRef]
Yu, C.; Lan, J.; Xie, J.; Hu, Y. QoS-aware traffic classification architecture using machine learning and deep packet inspection in SDNs. Procedia Comput. Sci. 2018, 131, 1209–1216. [Google Scholar] [CrossRef]
Parsaei, M.R.; Sobouti, M.J.; Khayami, S.; Javidan, R. Network traffic classification using machine learning techniques over software defined networks. Int. J. Adv. Comput. Sci. Appl. (IJACSA) 2017, 8, 220–225. [Google Scholar] [CrossRef] [Green Version]
Eom, W.J.; Song, Y.J.; Park, C.H.; Kim, J.K.; Kim, G.H.; Cho, Y.Z. Network Traffic Classification Using Ensemble Learning in Software-Defined Networks. In Proceedings of the 2021 International Conference on Artificial Intelligence in Information and Communication (ICAIIC), Jeju Island, Korea, 13–16 April 2021; pp. 89–92. [Google Scholar] [CrossRef]
Raikar, M.M.; Meena, S.M.; Mulla, M.M.; Shetti, N.S.; Karanandi, M. Data traffic classification in software defined networks (SDN) using supervised-learning. Procedia Comput. Sci. 2020, 171, 2750–2759. [Google Scholar] [CrossRef]
Zhao, P.; Zhao, W.; Liu, Q. Research on SDN Enabled by Machine Learning: An Overview. In International Conference on 5G for Future Wireless Networks; Springer: Cham, Switzerland, 2020; pp. 190–203. [Google Scholar] [CrossRef]
Patil, S.; Raj, L.A. Classification of traffic over collaborative IoT and Cloud platforms using deep learning recurrent LSTM. Comput. Sci. 2021, 22. [Google Scholar] [CrossRef]
Javeed, D.; Gao, T.; Khan, M.T.; Ahmad, I. A Hybrid Deep Learning-Driven SDN Enabled Mechanism for Secure Communication in Internet of Things (IoT). Sensors 2021, 21, 4884. [Google Scholar] [CrossRef] [PubMed]
Oreski, D.; Androcec, D. Genetic algorithm and artificial neural network for network forensic analytics. In Proceedings of the 43rd International Convention on Information, Communication and Electronic Technology (MIPRO), Opatija, Croatia, 28 September–2 October 2020; pp. 1200–1205. [Google Scholar] [CrossRef]
Alzahrani, R.J.; Alzahrani, A. Survey of Traffic Classification Solution in IoT Networks. Int. J. Comput. Appl. 2021, 183, 37–45. [Google Scholar] [CrossRef]
Ganesan, E.; Hwang, I.-S.; Liem, A.T.; Ab-Rahman, M.S. SDN-Enabled FiWi-IoT Smart Environment Network Traffic Classification Using Supervised ML Models. Photonics 2021, 8, 201. [Google Scholar] [CrossRef]
Aslam, M.; Ye, D.; Tariq, A.; Asad, M.; Hanif, M.; Ndzi, D.; Chelloug, S.A.; Elaziz, M.A.; Al-Qaness, M.A.A.; Jilani, S.F. Adaptive Machine Learning Based Distributed Denial-of-Services. Sensors 2022, 22, 2697. [Google Scholar] [CrossRef]
Maheshwari, A.; Mehraj, B.; Khan, M.S.; Idrisi, M.S. An optimized weighted voting based ensemble model for DDoS attack detection and mitigation in SDN environment. Microprocess. Microsyst. 2022, 89, 104412. [Google Scholar] [CrossRef]
Mishra, A.; Gupta, N. Supervised Machine Learning Algorithms Based on Classification for Detection of Distributed Denial of Service Attacks in SDN-Enabled Cloud Computing; Springer: Singapore, 2022; Volume 370, pp. 165–174. [Google Scholar] [CrossRef]
Zafeiropoulos, A.; Fotopoulou, E.; Peuster, M.; Schneider, S.; Gouvas, P.; Behnke, D.; Müller, M.; Bök, P.B.; Trakadas, P.; Karkazis, P.; et al. Benchmarking and Profiling 5G Verticals’ Applications: An Industrial IoT Use Case. In Proceedings of the 6th IEEE Conference on Network Softwarization (NetSoft), Ghent, Belgium, 29 June–3 July 2020; pp. 310–318. [Google Scholar] [CrossRef]
Uzunidis, D.; Karkazis, P.; Roussou, C.; Patrikakis, C.; Leligou, H.C. Intelligent Performance Prediction: The Use Case of a Hadoop Cluster. Electronics 2021, 10, 2690. [Google Scholar] [CrossRef]
Troia, S.; Martinez, D.E.; Martín, I.; Zorello, L.M.M.; Maier, G.; Hernández, J.A.; de Dios, O.G.; Garrich, M.; Romero-Gázquez, J.L.; Moreno-Muro, F.J.; et al. Machine Learning-assisted Planning and Provisioning for SDN/NFV-enabled Metropolitan Networks. In Proceedings of the European Conference on Networks and Communications (EuCNC), Valencia, Spain, 18–21 June 2019; pp. 438–442. [Google Scholar] [CrossRef] [Green Version]
Belkadi, O.; Laaziz, Y.; Vulpe, A.; Halunga, S. An Integration of OpenDaylight and OpenNebula for Cloud Management Improvement using SDN. In Proceedings of the 27th Telecommunications Forum (TELFOR), Belgrade, Serbia, 26–27 November 2019; pp. 1–4. [Google Scholar] [CrossRef]
Tcpdump Tool. Available online: https://www.tcpdump.org/ (accessed on 18 July 2021).
Tcptrace Tool. Available online: https://github.com/blitz/tcptrace (accessed on 18 July 2021).
Carela-Español, V.; Bujlow, T.; Barlet-Ros, P. Is our ground-truth for traffic classification reliable? In International Conference on Passive and Active Network Measurement; Springer: Cham, Switzerland, 2014; pp. 98–108. [Google Scholar] [CrossRef]
Moore, A.; Zuev, D.; Crogan, M. Discriminators for Use in Flow-Based Classification. s.l.; Department of Computer Science, Queen Mary and Westfield College: London, UK, 2005. [Google Scholar]
Das, A.K.; Pathak, P.H.; Chuah, C.N.; Mohapatra, P. Contextual localization through network traffic analysis. In Proceedings of the IEEE INFOCOM 2014—IEEE Conference on Computer Communications, Toronto, ON, Canada, 27 April–2 May 2014; pp. 925–933. [Google Scholar] [CrossRef]
Kim, H.; Claffy, K.C.; Fomenkov, M.; Barman, D.; Faloutsos, M.; Lee, K. Internet traffic classification demystified: Myths, caveats, and the best practices. In Proceedings of the 2008 ACM CoNEXT Conference, Madrid, Spain, 9–12 December 2008; pp. 1–12. [Google Scholar]
Basher, N.; Mahanti, A.; Mahanti, A.; Williamson, C.; Arlitt, M. A comparative analysis of web and peer-to-peer traffic. In Proceedings of the 17th international conference on World Wide Web, Beijing China, 21–25 April 2008; pp. 287–296. [Google Scholar] [CrossRef] [Green Version]
Moore, A.W.; Papagiannaki, K. Toward the accurate identification of network applications. In International Workshop on Passive and Active Network Measurement; Springer: Berlin/Heidelberg, Germany, 2005; pp. 41–54. [Google Scholar] [CrossRef]
Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
Dupay, A.; Sengupta, S.; Wolfson, O.; Yemini, Y. NETMATE: A network management environment. IEEE Netw. 1991, 5, 35–40. [Google Scholar] [CrossRef]
Netmate Features. Available online: https://github.com/DanielArndt/netmate-flowcalc/blob/master/doc/user_manual.pdf (accessed on 20 September 2021).
Platt, J. Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines; Microsoft: Redmond, WA, USA, 1998. [Google Scholar]
LibSVM Algorithm Used in WEKA. Available online: https://waikato.github.io/weka-wiki/lib_svm (accessed on 20 September 2021).
Chang, C.C.; Lin, C.J. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. (TIST) 2011, 2, 1–27. [Google Scholar] [CrossRef]

Figure 1. Traffic classification methods.

Figure 2. Data collection process.

Figure 3. Example of a PCAP file output.

Figure 4. Example of an output of the lsof command.

Figure 5. Training and classification process.

Figure 6. Second iteration of the classification tests.

Figure 7. Comparison of the classification algorithms with the studied features and the accuracy results.

Figure 8. Comparison of the classification algorithms with the Netmate features and accuracy results.

Figure 9. Average recall of the classifiers on both sets of features.

Figure 10. Average precision of the classifiers on both sets of features.

Table 1. Data set.

Experiment	Duration	# of Packets	Bytes
Facebook	5 m 12 s	21,354	16.92 MB
YouTube	5 m 14 s	61,157	60.30 MB
Test data	14 m 54 s	179,190	153.61 MB

Table 2. Feature categories.

Category	Features
Ports	Client port, server port
Timing	statistics of packet inter-arrival time
	statistics of flow inter-arrival time
	FFT of packet inter-arrival time (frequency: 1 to 10)
	total transmit time (difference between the times of the first and last packet)
	time since last connection
	packets per second
	bytes per second
	DNS requests per second
	TTL
	flow duration
	idle time
Packet count	total number of packets
	number of packets per flow
	number of out-of-order packets
	truncated packets
	packets with PUSH bits
	packets with SYN bits
	packets with FIN bits
	total number of ACK packets seen
	pure ACK packets
	number of packets carrying SACK blocks
	number of packets carrying DSACK blocks
	number of duplicate ACK packets received
	number of all window probe packets seen
Packet size	number of unique bytes sent
	truncated bytes
	sum of a flow’s packet size
	effective bandwidth
	throughput
	total bytes of data sent
	size of packets with URG bit turned on
TCP	statistics of retransmission
	segment statistics
	request statistics
	bulk mode
	statistics of control bytes in the packet
Window level	statistics of window advertisements seen
	number of times a zero-receive window was advertised
	number of bytes sent in the initial window
	number of segments sent in the initial window
RTT	total number of RTTs
	RTT sample statistics
	Full-size RTT sample
	RTT value calculated from the TCP three-way handshake
Application	number of new flows per second
	number of active flows per second

Table 3. Studied features.

Feature	Description
Flow level
Active flows per second	The number of active flows appearing in one second
New flows per second	The number of new flows appearing in one second
Flow size	The size of a flow in bytes
Flow duration	The time between the first and last packets seen in a flow
	in seconds
Flow inter-arrival time	Difference between the arrival times of two successive flows
Packet level
Packet inter-arrival time	Difference between the arrival times of two successive
	packets
Packets per second	Number of packets appearing in one second
Bytes per second	Sum of bytes in one second
Number of packets	The count of packets in one flow

Table 4. Feature sets.

Features	Definition	Count
Studied Features	Selected as in Section 3.2	9
Netmate Features	Generated by the Netmate tool [33]	44

Table 5. Precision and recall results for each application.

	YouTube				Facebook
	Precision		Recall		Precision		Recall
	S_f	N_f	S_f	N_f	S_f	N_f	S_f	N_f
Naive Bayes	0.8	0.781	0.667	0.758	0.545	0.542	0.542	0.650
SVM (SMO)	0.833	0.806	0.417	0.758	0.5	0.636	0.875	0.525
Random Forest	0.923	0.924	1	0.924	1	0.875	0.875	0.875
C4.5 (J48)	0.727	0.905	0.667	0.864	0.6	0.780	0.75	0.800

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Belkadi, O.; Vulpe, A.; Laaziz, Y.; Halunga, S. ML-Based Traffic Classification in an SDN-Enabled Cloud Environment. Electronics 2023, 12, 269. https://doi.org/10.3390/electronics12020269

AMA Style

Belkadi O, Vulpe A, Laaziz Y, Halunga S. ML-Based Traffic Classification in an SDN-Enabled Cloud Environment. Electronics. 2023; 12(2):269. https://doi.org/10.3390/electronics12020269

Chicago/Turabian Style

Belkadi, Omayma, Alexandru Vulpe, Yassin Laaziz, and Simona Halunga. 2023. "ML-Based Traffic Classification in an SDN-Enabled Cloud Environment" Electronics 12, no. 2: 269. https://doi.org/10.3390/electronics12020269

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

ML-Based Traffic Classification in an SDN-Enabled Cloud Environment

Abstract

1. Introduction

2. Related Work

3. Traffic Data

3.1. Dataset

3.2. Features Selection

4. Methodology

5. Results and Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI