Machine Learning Classification and Regression Approaches for Optical Network Traffic Prediction

Szostak, Daniel; Włodarczyk, Adam; Walkowiak, Krzysztof

doi:10.3390/electronics10131578

Open AccessArticle

Machine Learning Classification and Regression Approaches for Optical Network Traffic Prediction

by

Daniel Szostak

^*

,

Adam Włodarczyk

and

Krzysztof Walkowiak

^*

Faculty of Electronics, Wrocław University of Science and Technology, Wybrzeże Wyspiańskiego 27, 50-370 Wrocław, Poland

^*

Authors to whom correspondence should be addressed.

Electronics 2021, 10(13), 1578; https://doi.org/10.3390/electronics10131578

Submission received: 27 May 2021 / Revised: 22 June 2021 / Accepted: 25 June 2021 / Published: 30 June 2021

(This article belongs to the Special Issue Telecommunication Networks)

Download

Browse Figures

Versions Notes

Abstract

:

Rapid growth of network traffic causes the need for the development of new network technologies. Artificial intelligence provides suitable tools to improve currently used network optimization methods. In this paper, we propose a procedure for network traffic prediction. Based on optical networks’ (and other network technologies) characteristics, we focus on the prediction of fixed bitrate levels called traffic levels. We develop and evaluate two approaches based on different supervised machine learning (ML) methods—classification and regression. We examine four different ML models with various selected features. The tested datasets are based on real traffic patterns provided by the Seattle Internet Exchange Point (SIX). Obtained results are analyzed using a new quality metric, which allows researchers to find the best forecasting algorithm in terms of network resources usage and operational costs. Our research shows that regression provides better results than classification in case of all analyzed datasets. Additionally, the final choice of the most appropriate ML algorithm and model should depend on the network operator expectations.

Keywords:

optical networks; traffic forecasting; machine learning; classification; regression

1. Introduction

Quick and global development of network technologies such as the Internet of things, 5G, or cloud computing causes instant growth of endpoint devices. According to the Cisco Annual Internet Report, the number of Internet users will grow from 3.9 billion in 2018 to 5.3 billion in 2023 [1]. Moreover, the recent Nokia report [2] presents and discusses various network traffic trends in 2020, showing a significant growth of network traffic due to the COVID-19 pandemic. To overcome the possible capacity crunch problem in the Internet, network operators build and incessantly improve backbone networks utilizing various optical technologies [3,4]. However, constantly growing network traffic presents new challenges to the network operators. To improve the performance of future optical networks compared to conventional mechanisms currently used in optical networks, a concept of a cognitive optical network [5] has been proposed. In more detail, a cognitive optical network is a network with a cognitive process that can monitor current network conditions and then adjust the network operation to those conditions. The cognitive process, which uses history to improve performance, usually employs machine learning (ML) algorithms [6]. ML techniques, by analyzing and finding dependencies in historical data, e.g., traffic flows, learn and apply gained knowledge to make decisions in the future. More information about the application of ML techniques in optical networks can be found in [7,8,9,10,11].

Knowledge about future traffic in a network allows operators to optimize resource usage, provide better quality of service for users [12], reduce the cost of network operation, or detect anomalies in the traffic dataflow [13]. There are four forecasting types in terms of horizon time [14]: (i) real-time, in case of on-line traffic prediction, (ii) short-term, which forecasts up to several hours of future traffic, (iii) middle-term, predicts a few days traffic, and (iv) long-term, which can predict years of future network traffic. Knowledge obtained from each of the forecasting strategies can be used as valuable information for different network optimization tasks, e.g., traffic flow control, network operational cost reduction, anomalies detection, or physical network expansion. In our work, we focus on a short-term traffic prediction; however, our approach can be successfully applied to all forecasting types.

In this paper, we consider an optical network in which transmission between all physically connected nodes occurs continuously. Traffic flows between pairs of nodes can be modeled as continuous and regular flows, which change in time. We use ML classifiers and regressors to forecast future traffic. As a consequence of everyday user activities, working tasks, etc., some daily and weekly patterns in network traffic can be observed [15]. Because of this, some dependencies and periodicity are created over traffic flows. ML algorithms locate, analyze, and learn those dependencies to forecast future network traffic flows.

A single optical channel supported by a single transceiver can carry a fixed amount of data. As a result, the information required to establish a connection is the number of optical channels required to carry a transition, for instance, the number of 100 Gbps channels in a WDM (wavelength division multiplexing) network. Additionally, most of the transport network technologies such as an Optical Transport Network and various versions of Ethernet are also provisioned in some granularities of the bitrate. Therefore, during traffic prediction we focus on predicting traffic levels rather than the exact traffic volume.

This work is a continuation and extension of our recent paper [16]. The contribution of this paper is threefold. First, we compare two supervised ML approaches: classification, where we classify traffic into traffic levels, and regression, where we first forecast the value of the traffic flow (bitrate in a given time point), and then based on the obtained result, the traffic level is calculated by rounding up the obtained prediction to the closest traffic level. Second, we propose and test for various classifiers and regressors four different ML models based on various features. Last, we introduce a new quality metric, which allows the network operator to compare forecasting algorithms concerning the usage of network resources and operational costs.

The rest of this paper is organized as follows. Section 2 presents related works from a field of study. Next, in Section 3, the dataset creation method is depicted. The network model, ML approach, ML models, and evaluation metrics are described in Section 4. Section 5 shows numerical results and a brief description of them. At the end, final conclusions are presented in Section 6.

2. Related Works

Typically, the problem of forecasting network traffic is formulated as a time series problem. A large majority of works in the field use approaches based on an autoregressive integrated moving average (ARIMA) and its numerous variations, as well as ML techniques [17], to solve this. Authors in [18] compared ARIMA, Holt-Winters, and neural network algorithms for forecasting the amount of traffic in TCP/IP-based networks. In [19], ARIMA and SARMIA models were used for short-term and long-term future traffic volume predictions. As a result, the required bandwidth was reduced by 18.9%. In Reference [20], the allocation of data center traffic with and without traffic prediction was compared. The performance of all traffic allocation algorithms was improved by using ML techniques. In Reference [21], the authors used a graph convolutional generative adversarial networks model to predict burst events in an optical network. The proposed method outperformed the LSTM reference method.

Besides the traffic prediction problem, ML techniques can be successfully employed for other purposes in optical networks. In Reference [22], the authors used ML techniques for the problem of fault localization in optical networks. They successfully localized single-link failures using a Gaussian process classifier trained on data that described the network state upon current and past failure incidents. The presented approach achieves high localization accuracy ranges from 91% to 99%. A similar problem, connected with failure localization, is considered in [23]. Authors presented an ML system for detection and identification of equipment failures in optical networks. They tested several ML methods, a random forest, a neural network with a single hidden layer, and different variants of the support vector machine. As a result, accuracy above 98% was obtained. Authors in [24] used ML classification techniques such as decision tree and naïve Bayes discretization to classify traffic flows into mouse flows (occur frequently but carry a small number of bytes) and elephant flows (occur occasionally but have a huge number of bytes). The paper presented classifiers performance in terms of accuracy and classification speed. Another well-examined issue is estimation of the quality of service. In Reference [25], the authors introduced an alien wavelength performance monitoring technique and ML quality of service estimation for lightpath provisioning of intradomain and interdomain traffic. Obtained results reached up to 95% of prediction accuracy. Authors in [26] proposed an ML regression approach to predict the quality of transmission of an unestablished lightpath. They used a neural network as a base algorithm for prediction. The evaluation was carried out considering the generalized signal-to-noise ratio metric. In Reference [27], the authors presented an intelligent module in the form of an ML application using deep learning modeling. The system described in this publication uses a neutral network for solving the task of proactive network monitoring for the security and protection of computing infrastructures.

Despite the fact that many works have presented promising results, application of ML methods to network problems is still in its early stage [8]. Thus, there is a high demand for exploring the topic of ML usage for solving network problems [7]. Most of the related works implemented the traffic prediction task as a prediction of exact traffic bitrates. In this paper, we formulate the prediction problem as a prediction of fixed traffic levels. Such a concept follows from characteristics of optical networks and other transport network technologies. According to the best of our knowledge, the only paper that focuses on the prediction of traffic levels in optical networks is our previous paper [16], where we presented some preliminary results on this problem. To fill the research gap, in this paper we introduce and examine two supervised ML approaches: classification and regression. We also present four ML models used to test a wide set of various ML algorithms. All experiments were conducted using datasets, which were generated based on real traffic characteristics. The results reported in the next sections prove that the proposed approaches outperform the methods described in [16].

3. Datasets

This section describes the method of dataset generation together with traffic flows used in research.

3.1. Traffic Generator

Datasets used in the research were created with the use of the custom traffic generator. The overall shape of output traffic reflects the real-world traffic based on the time-varying data taken from the Internet Exchange Point in Seattle, Washington (SIX). Traffic statistics for this exchange point are available publicly at its official website (https://www.seattleix.net/ (accessed on 19 December 2019)). The data was gathered for a one month interval, namely, from 24th of October to 19th of December 2019 from rrd traffic files which were uploaded periodically with a granulation time of 5 min per time slot. Every week the recent data was downloaded, read and stored. This data was then applied as an input for the described traffic generator as a general shape of time-varying traffic for a month-span interval.

The generator is an ensemble of smaller requests generators that represent several web services, each with an assigned share [1] and their own properties such as a set of combinations of stochastic processes with assigned individual parameters and contribution scales for each of them. Stochastic processes which are assumed are Poisson process (PP), Poisson Pareto burst process (PPBP) [28], and a constant traffic (CT) with uniformly distributed random offset. Considered web services are:

Internet video with a share of 51% of overall bitrate made of two different PPs, PPBP and CT;
IP VOD with a share of 22% of overall bitrate made of a single PP;
Web data with a share of 18% of overall bitrate made of the different PPs;
File sharing with a share of 8% of overall bitrate made of a single CT;
Gaming with a share of 1% of overall bitrate made of a single CT.

Such differentiation of traffic characteristics of given web services tries to project the diverse nature of the Internet traffic over time. The overall bitrate of generated requests varies through time depending on the provided traffic characteristics. This traffic is distributed between nodes of a given network topology. In this research, we considered the Euro28 backbone network [29] with 28 nodes, 84 links, and an average link length of 625 km. The distribution of traffic between nodes is inversely proportional to the distance between each pair of nodes. The output traffic is a series of tuples of 756 numbers, representing the volume of traffic for each pair of nodes in each time slot. The expected summed bitrate for each time slot reflects the overall required bitrate for the considered network and keeps the time-varying trends similar to the provided traffic characteristics. The division of traffic between nodes provides a more insightful perspective on the traffic in the network and allows for focusing on particular nodes.

3.2. Datasets

Note that, to measure the fluctuation of various traffic flows obtained from the traffic generator described above, we used the mean absolute percentage error (MAPE) This describes how the values of one function differ from values of the base function, which in this case was the original SIX traffic profile used to generate traffic for particular node pairs, as described in Section 3.1. To calculate MAPE of a single traffic flow from the generated data, as the base function, we used SIX bitrate flows and normalized them to a common range of considered flow bitrate values. In other words, MAPE indicates how the considered traffic flow differs from the SIX traffic. For further experiments, we selected the following 5 datasets with various values of the MAPE parameter:

dataset_1–traffic flow between a single nodes pair, MAPE equal to 3.39%,
dataset_2–traffic flow between a single nodes pair, MAPE equal to 8.21%,
dataset_3–traffic flow between a single nodes pair, MAPE equal to 13.35%,
dataset_4–whole traffic incoming to a single network node, MAPE equal to 1.33%,
dataset_5–original SIX traffic, MAPE equal to 0%.

Figure 1 presents visual representation of datasets used during experiments. The vertical axis presents normalized bitrate. Because datasets’ bitrates ranges differ from each other, for the sake of visualization, datasets bitrates were normalized to the same range. This clearly shows that traffic flows differ from each other in terms of their fluctuation.

4. Proposed Models and Algorithms

The following section briefly reviews the background concepts of the considered optical network, ML approaches, ML models’ creation, evaluation metrics, and configurations of conducted experiments.

4.1. Network Model

Let n be the number of all nodes in the network. In our work, we assumed that the network is modeled as a directed graph

G = (N, E)

, where N represents the set of n physical nodes and E represents the set of links connecting them (reflecting the set of physical links between nodes). The time scale of the network operation was divided into time intervals of the same size (e.g., 5 min). For consecutive time intervals, traffic volumes (bitrates) related to a single pair of nodes or whole traffic going through a single node created continuous and regular flows.

4.2. ML Approach

Usually, the task of forecasting the future traffic in the optical network is to predict the shape of the flow over a selected period of time. Because of that, such a problem is considered as a time series problem that can be solved based on regression solutions. In our work, traffic prediction was realized by an indication of a traffic level in a particular time interval (rather than the exact traffic bitrate). Motivation behind such an approach are optical technologies such as wavelength division multiplexing (WDM) and elastic optical networks (EON) [3,4,30]. In more detail, in optical technologies, each optical channel (i.e., lightpath) can carry a fixed amount of data (fixed bitrate) based on the characteristics of the used transceiver. For example, supposing that a single optical channel supported by a single transceiver can carry 100 Gbps. To serve traffic with bitrates of 110 Gbps or 190 Gbps, regardless of the fact that the bitrates are different, we need to establish two optical channels (two transceivers) for transmission, since we need 200 Gbps to provision each of the considered bitrates. Therefore, we focused on the number of optical channels (defined by provisioned bitrate) that are needed to send data, instead of the exact value of the traffic bitrate. Another motivation behind considering prediction of traffic levels were Ethernet standards, very popular in optical networks and offering transmission of defined bitrate levels such as 100 Gb Ethernet [4].

Figure 2 illustrates the process of defining traffic levels for traffic flows. The blue line represents real bitrate values and the green line traffic levels that correspond to them. Possible classes (determined by a single channel width) are represented by grey horizontal lines. In such cases, traffic forecasting can be carried out by predicting future green line levels. We examined two different supervised ML approaches for defining traffic levels:

1.: Classification approach–a problem is considered as a classification task, where possible classes are determined by traffic levels (multiples of a single channel bitrates). The following classifiers were tested in this approach: linear discriminant analysis (LDA), k neighbors classifier (KNN), Gaussian naïve Bayes (GNB), decision tree (DT).
2.: Regression approach–first, using a selected regression algorithm, the exact value of bitrate in a particular time interval is predicted. Next, based on the obtained results, the traffic levels in time intervals are calculated by rounding up the predicted bitrate to the closest traffic level. The following regressors were tested in this approach: linear regression (LR), Passive Aggressive Regressor (PAR), k neighbors regressor (KNNR), multi-layer perceptron regressor (MLPR).

Figure 2. Traffic levels.

Particular traffic flows change over time, thus the data used in the experiment can be characterized as data streams [31]. Therefore, we did not use a classic k-fold cross validation and training datasets were created based on consecutive iterations (time intervals). In more detail, the whole dataset for the experiment consisted of traffic flows information lasting about 58 days. We took the first 28 days to create a training dataset and the last 30 days to create a testing dataset.

4.3. ML Models

The training dataset was created based on historical traffic flows. Selection of suitable dataset instances’ features is of utmost importance for solving ML problems. Figure 3 presents the autocorrelation of dataset_1 traffic flow. In detail, it determines how much the value of a particular bitrate depends on bitrate values from previous time intervals. The further the distance from zero is, the greater correlation occurs. Vertical axis shows autocorrelation values and horizontal axis contains time intervals. It can be clearly seen that the highest correlations occurred for the closest preceding iterations. Additionally, a meaningful correlation was observed for time intervals that occurred every 12 h (every 144 time intervals, since a single time interval represented 5 min). In addition, positive correlations (greater than zero, occurred every 24 h, which refers to 288 time intervals) were higher than negative correlations. Variability of traffic flows was correlated with time, therefore information about the minute during the day and the day during the week when traffic volume occurred was also essential for predicting tasks.

Let window denote the number of preceding time intervals, of which bitrates are taken as features. We propose four different ML models based on different selection of features:

1.: window = 1, (i.e., only the previous time interval is considered as an input feature);
2.: window = 10, (i.e., the 10 previous time intervals are included in the model);
3.: window = 10, minute, day, (i.e., additionally, the information on the time stamp of the considered time interval is included in the model, the time stamp includes day which denotes the day of week and minute that denotes the number of minutes from the beginning of the day);
4.: window = 10, minute, day, traffic values (bitrates) from previous 24 h, 48 h, …, 336 h (two weeks) time intervals.

The number of features of each instance in the training dataset depends on the selected ML model. Figure 4 depicts the model creation process for the classification task. The feature list includes information about the day, minute and bitrate in the previous time interval (window = 1). Possible traffic levels (classes) are multiples of 100. First window iterations were omitted during the model creation, because there was no information about all of their previous time intervals. During the classification, the classifier took instance feature values and as a result returned the bitrate level for the future time interval.

Figure 5 presents a diagram illustrating the implemented system. At the input, the system took the number of the considered model, historical traffic flows, and list of tested ML algorithms. During the data preprocessing phase, datasets for training and testing were created. The number of instance features depended on the considered model. Datasets were created based on historical traffic flows, as Figure 4 presents. Next, ML algorithms were trained and tested. At the output, the system returned an evaluation metric value for each ML algorithm.

4.4. Quality Metric

Typically, the base metric to compare ML algorithms is accuracy. In the considered problem, this carried information about the percentage number of correctly predicted traffic levels; however, such result analysis omitted information about too low and too high traffic level predictions. This knowledge can be significant for a network operator, because it can result in better resources usage and cost management. For instance, underprediction of the future traffic can result in future allocation of too few optical channels and, as a consequence, the blocking of some traffic, which can trigger the operator having to pay some additional penalties due to SLA agreements. In turn, overprediction of the future traffic has less severe consequences, for instance, such as greater power consumption; however, it does not yield traffic blocking.

Therefore, to gain a better ML algorithms assessment specific for the analyzed data related to network traffic, we propose a new evaluation metric called Traffic Level Prediction Quality (TLPQ), which is calculated based on the key prediction variants. We distinguish the following prediction variants:

a.: predicted traffic level is correct;
b.: predicted traffic level is too low and it differs from correct level by one class, i.e., underprediction by one traffic level;
c.: predicted traffic level is too low and it differs from correct level by more than one class, i.e., underprediction by more than one traffic level;
d.: predicted traffic level is too high and it differs from correct level by one class, i.e., overprediction by one traffic level;
e.: predicted traffic level is too high and it differs from correct level by more than one class, i.e., overprediction by more than one traffic level.

The equation below indicates the way to calculate the TLPQ metric

TLPQ = a + α \times d - β \times (b + c)

(1)

TLPQ varies from 0 to 100, assuming that parameters α and β do not exceed 1. The higher this is, the better performance is. Note that the values of α and β tuning parameters can be freely determined by network operators, based on their expectations and needs. For instance, α = 1 means that overprediction by one traffic level is accepted (it increases the value of the TLPQ). In turn, β = 1 indicates that the underprediction is not acceptable and it leads to a decrease of the TLPQ value.

5. Numerical Results

Experiments were conducted on five datasets described in Section 3 using the ML models and algorithms presented in Section 4. The analyzed ML models and algorithms were implemented in Python, using the scikit-learn ML package [32]. All classifiers’ and regressors’ parameters were left as default. In order to examine performance of the proposed ML methods in various situations that can occur in optical networks, for each analyzed dataset we created three different cases in terms of the number of traffic levels, namely, we considered 7, 12 and 20 traffic levels. The motivation behind this approach was that, based on the traffic bitrates and applied technology (e.g., various bitrates per one optical channel (transceiver)), the number of possible traffic levels can vary in real-world scenarios.

5.1. Accuracy and Root Mean Square Percentage Error

Table 1 reports the accuracy and root mean square percentage error (RMSPE) results obtained for traffic level prediction for all considered regressors and models, based on datasets where traffic was divided into seven traffic levels. Individual models differed in a dataset instances features list. RMSPE indicates by how many percent on average considered regressor prediction differs compared to real value. Such a measure allows researchers to evaluate regressors more precisely. Cells highlighted in green consist of the best results for the considered model and dataset, i.e., for each model (four adjacent rows) and dataset (one column), the result is highlighted in green. Results in blue are the best for the whole dataset (along all models), i.e., one cell in each column is marked in blue.

Based on the presented results, the following general trend can be noticed. For datasets with lower MAPE parameter value (i.e., dataset_4 and dataset_5), the accuracy was quite high—the best results were 92.05% and 96.68%, respectively. In turn, for dataset_3–dataset_5 with higher MAPE values, the accuracy was much lower. The general observation is that the accuracy grows with the increase of the MAPE parameter. By analyzing the results of different models, it is clear that adding to the dataset consecutive features allows regressors to yield better results, i.e., models 3 and 4 with additional features on previous seasonal time stamp outperform models 1 and 2. For dataset_1 to dataset_3 (with traffic between a single nodes pair), the best accuracy for traffic level prediction was achieved using model 4. In turn, for dataset_4 and dataset_5 (with incoming traffic to node and SIX traffic) the best performance was yielded by model 3. In almost every case, the highest accuracy for traffic level prediction was calculated based on the LR regressor. The exception to this rule was seen in case of model 1 for dataset_4 and dataset_5, where the best accuracy for traffic level prediction was calculated based on the MLPR regressor; however, their results were close to the results of the LR regressor.

With the increase in dataset MAPE value, RMSPE of traffic level prediction also increased. This was equal to 2.79% for dataset_5 and reached 15.24% for dataset_3.

Figure 6 and Figure 7 compare the real bitrate with the bitrate predicted by the LR regressor for dataset_2 (Figure 6) and dataset_4 (Figure 7) in cases of traffic with seven possible traffic levels. The blue line represents the real bitrate and the orange line depicts the predicted traffic by the regressor. The dotted line shows possible traffic levels. The charts present data from the first day of tested datasets. They demonstrate that traffic from dataset_2 with a higher MAPE value (Figure 6) was sharper, while traffic from dataset_4 with a lower MAPE value (Figure 7) was rather smooth. In the case of the former, traffic levels were determined based on the real traffic line and the predicted traffic line varied more often. This causes lower accuracy and is reflected in the results presented in Table 1.

5.2. TLPQ Metric

As was stated above, from the point of view of the network operator, analyzing the results of classifiers/regressors considering only accuracy for the traffic level prediction can be insufficient. This overlooks information about the number of instances for which traffic level is underpredicted (based on these predictions, the network operator cannot serve traffic) and overpredicted (based on these predictions, the operator can serve traffic; however, it requires additional network resources). To face this problem, our proposed metric takes into account above-mentioned variables. In our work, we consider three different network operator scenarios, differing from each other by values of the tuning parameters α and β used in the TLPQ (1):

Scenario 1 (S_1)–α = 1 and β = 0;
Scenario 2 (S_2)–α = 0.7 and β = 0.3;
Scenario 3 (S_3)–α = 0.5 and β = 0.5.

Table 2, Table 3, Table 4, Table 5, Table 6 and Table 7 present TLPQ values for all scenarios based on data from each dataset. Table 2, Table 3 and Table 4 refer to the regression approach while Table 5, Table 6 and Table 7 refer to the classification approach. Different tables consist of results for datasets with different number of possible traffic levels. Again, cells highlighted in green consist of the best results for the considered model and dataset, i.e., for each model (four adjacent rows) and dataset (one main column), the result is highlighted in green. Results in blue are the best for the whole dataset (among all models) in a particular scenario, i.e., one cell in each column is marked in blue.

Analyzing the results presented in Table 2, Table 3, Table 4, Table 5, Table 6 and Table 7, it is difficult to point out the ML algorithm for which TLPQ values were the best for all datasets, i.e., one algorithm which can be considered as the best regardless of dataset. Regressors that most often obtained the best results were PAR, MLPR, and LR. In the case of the classification approach, the best results were obtained most often by LDA. Other classifiers that achieved the best results in some cases were KNN, GBT, and DT, mainly for dataset_4 and dataset_5; however, their TLPQ values differed slightly from LDA results. Choosing the best model is difficult because for different dataset types and different scenarios the best models are different.

It is hard to define explicitly which value of TLPQ can be considered as good and which as bad. The TLPQ value is strongly correlated with alpha and beta parameters. Based on the results in Table 2, Table 3, Table 4, Table 5, Table 6 and Table 7, it can be seen that the best TLPQ values for a single dataset often vary significantly for different scenarios; nevertheless, they point to the same classifier/regressor, i.e., Table 2 shows that for dataset_1 and model 1, in the case of each scenario, PAR yielded the best results, returning different TLPQ values for all scenarios. Only the values of TLPQ for the same scenarios can be compared. Assessing the results obtained for classification and regression approaches, it is clear that in the case of regression TLPQ values were higher, in detail, calculation of traffic levels based on predicted traffic flow is more accurate than predicting traffic levels directly as the result of classification.

5.3. Classification vs. Regression

Table 8 presents the difference between the best values of TLPQ yielded by regression and classification approaches for all scenarios and all datasets, for instance, value 3.2 shown in the first cell indicates that regression outperforms classification by 3.2 points in terms of the TLPQ metric. The higher the presented value is, the better the performance of regression when compared against classification. The goal of Table 8 is to compare the general performance of the two main applied approaches, namely, regression and classification. Green marks the biggest differences while red marks the smallest differences. It can be seen that in every case the regression approach obtained better results. While increasing the dataset MAPE value, the gap between both approaches also increased and reached 10.6 points for dataset_1 in scenario 1. The difference was higher for datasets with a greater number of possible traffic levels. Exceptions occurred for dataset_2 and dataset_3, where generally the highest difference was for seven possible traffic levels.

5.4. Algorithms Performance Summary

Figure 8, Figure 9 and Figure 10 show the best TLPQ values for all datasets and all scenarios. Figure 8 refers to scenario 1, Figure 9 refers to scenario 2, and Figure 10 refers to scenario 3. The results are summarized in Table 2, Table 3, Table 4, Table 5, Table 6 and Table 7. For each scenario and dataset, the best TPLQ values were obtained for dataset cases with seven traffic levels. For datasets with lower MAPE value parameter, i.e., dataset_4 and dataset_5, similar results were obtained for each traffic level case. With the increase in dataset MAPE value, the difference between TLPQ values for different traffic level cases also increased.

Table 9 consists of regressors and models that obtained the best results in a particular dataset based on the TLPQ value and accuracy (in detail, MLPR 3 means that the best regressor for the considered dataset and metric was MLPR using the model 3). If there are more than one regressor and model in a cell, this means that for different scenarios different regressors and models were considered as the best, i.e., for dataset_3 and TLPQ metric. In all other cases, a single regressor and model was the best for all scenarios. It can be observed that considering different measures, miscellaneous regressors and models can be described as the best. This shows that, even if the regressor is the best in terms of accuracy for the final level, it will not necessarily be the best choice for the operator (considering network resource usage).

6. Conclusions

In this paper, we proposed various ML approaches for traffic level prediction in optical networks based on regression and classification methods using four various ML models with different features. To compare the proposed approaches, we formulated a new aggregate quality measure TLPQ that accounts for information about too low or too high traffic level predictions, which in our opinion can be more informative for network operators. We examined four different classifiers and four different regressors.

Our study showed that in all analyzed cases, the regression approach outperforms the classification approach, i.e., regression provides better results when compared to the results yielded by classification. The TLPQ difference between those two approaches varies from one to ten points depending on the dataset. To recall, using regression, first the traffic bitrate is predicted and then the traffic level is calculated according to the obtained bitrate. In turn, using classification, the traffic level is predicted directly as the result of classification, which causes more frequent wrong predictions.

Comparing the performances of various regressors and models, we noticed that the final selection of the most appropriate regressor and model should follow the network operator’s expectations. Different choices are suitable for the network operator who cannot waste network resources compared to an operator who cares more about the served traffic rather than resource usage. Therefore, the network operator should test various regressors and models to find a configuration guaranteeing the best prediction performance according to their needs. Reported results show that the best model for each dataset cannot be indicated. This proved the no free lunch theorem formulated by Wolpert [33], that there is no one algorithm that suits the best for every problem. Regressors that can be considered as the best among all tested are PAR and MLPR. They usually achieved the best results. In the case of the classification approach, LDA can be pointed to as the classifier which obtained the best results for almost all datasets.

Another observation is that the accuracy as well as TLPQ values are strictly correlated with the dataset MAPE value. Classifiers and regressors obtained better results for datasets characterized by lower MAPE value. Traffic in those datasets rarely changes the level, so the prediction is easier.

In future work, we plan to develop an optimization framework that uses the obtained knowledge about future traffic to improve the routing algorithms applied in the optical network. We also plan to study the possibility of middle-term and long-term prediction using the data stream methods. Moreover, we would like to test our approaches on other datasets generated based on different Internet exchange points.

Author Contributions

Conceptualization, D.S. and K.W.; methodology, D.S. and K.W.; software, D.S. and A.W.; validation, D.S. and K.W.; formal analysis, D.S. and K.W.; resources, D.S., A.W., K.W.; data curation, D.S., A.W.; writing—original draft preparation, D.S., A.W., K.W.; writing—review and editing, D.S., A.W., K.W.; visualization, D.S.; supervision, K.W.; project administration, K.W.; funding acquisition, K.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by National Science Centre, Poland under Grant 2017/27/B/ST7/00888.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ARIMA	Autoregressive Integrated Moving Average
CT	Constant Traffic
DT	Decision Tree
EON	Elastic Optical Network
GNB	Gaussian Naive Bayes
KNN	K Neighbors Classifier
KNNR	K Neighbors Regressor
LDA	Linear Discriminant Analysis
LR	Linear Regression
MAPE	Mean Absolute Percentage Error
ML	Machine Learning
MLPR	Multi-Layer Perceptron Regressor
PAR	Passive Aggressive Regressor
PP	Poisson Process
PPBP	Poisson Pareto Burst Process
RMSPE	Root Mean Squared Percentage Error
S_1	Scenario 1
S_2	Scenario 2
S_3	Scenario 3
SIX	Seattle Internet Exchange Point
TLPQ	Traffic Level Prediction Quality
WDM	Wavelength Division Multiplexing

References

CISCO. Cisco Annual Internet Report (2018–2023); CISCO: San Jose, CA, USA, 2020. [Google Scholar]
Nokia. Deepfield Network Intelligence Report Networks in 2020; Nokia: Espoo, Finland, 2020. [Google Scholar]
Walkowiak, K. Modeling and Optimization of Cloud-Ready and Content-Oriented Networks; Studies in Systems, Decision and Control; Springer: Cham, Switzerland, 2016; Volume 56. [Google Scholar]
Mukherjee, B.; Tomkos, I.; Tornatore, M.; Winzer, P.; Zhao, Y. Springer Handbook of Optical Networks; Springer: Cham, Switzerland, 2020. [Google Scholar]
Chan, V.W.S. Cognitive optical networks. In Proceedings of the IEEE International Conference on Communications (ICC), Kansas City, MO, USA, 20–24 May 2018. [Google Scholar] [CrossRef]
Chan, V.W.S.; Jang, E. Cognitive all-optical fiber network architecture. In Proceedings of the International Conference on Transparent Optical Networks (ICTON), Girona, Spain, 2–6 July 2017. [Google Scholar] [CrossRef]
Boutaba, R.; Salahuddin, M.A.; Limam, N.; Ayoubi, S.; Shahriar, N.; Estrada-Solano, F.; Caicedo, O.M. A comprehensive survey on machine learning for networking: Evolution, applications and research opportunities. J. Internet Serv. Appl. 2018, 9, 16. [Google Scholar] [CrossRef] [Green Version]
Mata, J.; de Miguel, I.; Duran, R.J.; Merayo, N.; Singh, S.K.; Jukan, A.; Chamania, M. Artificial intelligence (AI) methods in optical networks: A comprehensive survey. Opt. Switch. Netw. 2018, 28, 43–57. [Google Scholar] [CrossRef]
Musumeci, F.; Rottondi, C.; Nag, A.; Macaluso, I.; Zibar, D.; Ruffini, M.; Tornatore, M. An Overview on Application of Machine Learning Techniques in Optical Networks. IEEE Commun. Surv. Tutor. 2019, 21, 1383–1408. [Google Scholar] [CrossRef] [Green Version]
Rafique, D.; Velasco, L. Machine Learning for Network Automation: Overview, Architecture, and Applications [Invited Tutorial]. J. Opt. Commun. Netw. 2018, 10, D126–D143. [Google Scholar] [CrossRef] [Green Version]
Khan, F.N.; Lu, C.; Lau, A.P.T. Optical Performance Monitoring in Fiber-Optic Networks Enabled by Machine Learning Techniques. In Proceedings of the Optical Fiber Communications Conference and Exposition (OFC), San Diego, CA, USA, 11–15 March 2018. [Google Scholar]
Alarcon-Aquino, V.; Barria, J.A. Multiresolution FIR neural-network-based learning algorithm applied to network traffic prediction. IEEE Trans. Syst. Man Cybern. Part C 2006, 36, 208–220. [Google Scholar] [CrossRef] [Green Version]
Krishnamurthy, B.; Sen, S.; Zhang, Y.; Chen, Y. Sketch-based change detection: Methods, evaluation, and applications. In Proceedings of the 3rd ACM SIGCOMM Conference on Internet Measurement, Miami Beach, FL, USA, 27–29 October 2003. [Google Scholar] [CrossRef]
Ding, X.; Canu, S.; Denoeux, T.; Rue, T.; Pernant, F. Neural network based models for forecasting. In Proceedings of the Applied Decision Technologies Conference, San Antonio, TX, USA, 4–8 June 1995; pp. 243–252. [Google Scholar]
Argon, O.; Shavitt, Y.; Weinsberg, U. Inferring the periodicity in large-scale Internet measurements. In Proceedings of the IEEE INFOCOM, Turin, Italy, 14–19 April 2013. [Google Scholar] [CrossRef]
Szostak, D.; Walkowiak, K.; Włodarczyk, A. Short-term Traffic Forecasting in Optical Network using Linear Discriminant Analysis Machine Learning Classifier. In Proceedings of the International Conference on Transparent Optical Networks (ICTON), Bari, Italy, 19–23 July 2020. [Google Scholar] [CrossRef]
Rzym, G.; Boryło, P.; Chołda, P. A Time-Efficient Shrinkage Algorithm for Fourier-Based Prediction Enabling Proactive Optimization in Software Defined Networks. Int. J. Commun. Syst. 2019, 33, e4448. [Google Scholar] [CrossRef]
Cortez, P.; Rio, M.; Rocha, M.; Sousa, P. Multi-scale Internet traffic forecasting using neural networks and time series methods. Expert Syst. 2012, 29, 143–155. [Google Scholar] [CrossRef] [Green Version]
Otoshi, T.; Ohsita, Y.; Murata, M.; Takahashi, Y.; Ishibashi, K.; Shiomoto, K. Traffic prediction for dynamic traffic engineering. Comput. Netw. 2015, 85, 36–50. [Google Scholar] [CrossRef]
Aibin, M. Traffic prediction based on machine learning for elastic optical networks. Opt. Switch. Netw. 2018, 30, 33–39. [Google Scholar] [CrossRef]
Vinchoff, C.; Chung, N.; Gordon, T.; Lyford, L.; Aibin, M. Traffic Prediction in Optical Networks Using Graph Convolutional Generative Adversarial Networks. In Proceedings of the International Conference on Transparent Optical Networks (ICTON), Bari, Italy, 19–23 July 2020; pp. 3–6. [Google Scholar] [CrossRef]
Panayiotou, T.; Chatzis, S.P.; Ellinas, G. Leveraging Statistical Machine Learning to Address Failure Localization in Optical Networks. J. Opt. Commun. Netw. 2018, 10, 162–173. [Google Scholar] [CrossRef]
Shahkarami, S.; Musumeci, F.; Cugini, F.; Tornatore, M. Machine Learning-Based Soft-Failure Detection and Identification in Optical Networks. In Proceedings of the Optical Fiber Communications Conference and Exposition (OFC), San Diego, CA, USA, 11–15 March 2018. [Google Scholar]
Wang, L.; Wang, X.; Tornatore, M.; Kim, K.J.; Kim, S.M.; Kim, D.U.; Han, K.E.; Mukherjee, B. Scheduling with Machine-Learning-Based Flow Detection for Packet-Switched Optical Data Center Networks. J. Opt. Commun. Netw. 2018, 10, 365–375. [Google Scholar] [CrossRef]
Proietti, R.; Chen, X.; Zhang, K.; Liu, G.; Shamsabardeh, M.; Castro, A.; Velasco, L.; Zhu, Z.; Yoo, S.J.B. Experimental Demonstration of Machine-Learning-Aided QoT Estimation in Multi-Domain Elastic Optical Networks with Alien Wavelengths. J. Opt. Commun. Netw. 2018, 11, A1–A10. [Google Scholar] [CrossRef]
Khan, I.; Bilal, M.; Siddiqui, M.; Khan, M.; Ahmad, A.; Shahzad, M.; Curri, V. QoT Estimation for Light-path Provisioning in Un-Seen Optical Networks using Machine Learning. In Proceedings of the International Conference on Transparent Optical Networks (ICTON), Bari, Italy, 19–23 July 2020. [Google Scholar] [CrossRef]
Nguyen, G.; Dlugolinsky, S.; Tran, V.; Lopez Garcia, A. Deep learning for proactive network monitoring and security protection. IEEE Access 2020, 8, 19696–19716. [Google Scholar] [CrossRef]
Zukerman, M.; Neame, T.D.; Addie, R.G. Internet traffic modeling and future technology implications. In Proceedings of the IEEE INFOCOM 2003, San Francisco, CA, USA, 30 March–3 April 2003. [Google Scholar] [CrossRef] [Green Version]
Walkowiak, K.; Klinkowski, M.; Lechowicz, P. Dynamic Routing in Spectrally Spatially Flexible Optical Networks with Back-to-Back Regeneration. J. Opt. Commun. Netw. 2018, 10, 523–534. [Google Scholar] [CrossRef]
Simmons, J. Optical Network Design and Planning, 2nd ed.; Springer: Cham, Switzerland, 2014. [Google Scholar]
Bifet, A.; Morales, G.D.F. Big Data Stream Learning with SAMOA. In Proceedings of the IEEE International Conference on Data Mining Workshop, Shenzhen, China, 14 December 2014. [Google Scholar] [CrossRef]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine learning in python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
Wolpert, D.H. The Supervised learning no-free-lunch theorems. In Proceedings of the 6th Online World Conference on Soft Computing in Industrial Applications, Online. 10–24 September 2001. [Google Scholar] [CrossRef]

Figure 1. Traffic flows.

Figure 3. Autocorrelation.

Figure 4. ML model creation example, features: day, minute, window = 1, classification.

Figure 5. System diagram.

Figure 6. Real and predicted traffic for dataset_2, LR regressor.

Figure 7. Real and predicted traffic for dataset_4, LR regressor.

Figure 8. Scenario 1 (S_1) the best TLPQ values.

Figure 9. Scenario 2 (S_2) the best TLPQ values.

Figure 10. Scenario 3 (S_3) the best TLPQ values.

Table 1. Regression approach, accuracy and RMSPE for datasets with 7 possible traffic levels.

-	-	DATASET_1		DATASET_2		DATASET_3		DATASET_4		DATASET_5
Model	Reg	Accuracy	RMSPE	Accuracy	RMSPE	Accuracy	RMSPE	Accuracy	RMSPE	Accuracy	RMSPE
1	LR	84.55%	6.60%	68.55%	11.14%	62.55%	15.57%	91.77%	3.80%	95.54%	3.28%
	PAR	84.23%	6.71%	60.04%	11.66%	34.15%	24.73%	91.71%	3.75%	95.37%	3.39%
	KNNR	83.44%	6.80%	66.55%	11.51%	60.50%	16.28%	91.21%	3.89%	95.17%	3.35%
	MLPR	84.50%	6.57%	68.36%	11.08%	61.86%	15.24%	91.83%	3.76%	95.59%	3.24%
2	LR	84.81%	6.54%	69.43%	10.90%	63.66%	15.21%	92.04%	3.69%	96.64%	2.81%
	PAR	83.18%	6.47%	30.17%	19.03%	61.05%	16.28%	87.88%	4.71%	95.27%	3.30%
	KNNR	83.24%	6.79%	65.23%	11.76%	59.67%	16.23%	90.15%	4.07%	94.79%	3.49%
	MLPR	84.60%	6.33%	69.19%	10.92%	63.40%	15.14%	91.12%	3.76%	95.49%	3.19%
3	LR	84.96%	6.51%	69.43%	10.90%	63.73%	15.20%	92.05%	3.68%	96.68%	2.79%
	PAR	74.33%	7.65%	64.60%	12.33%	63.14%	15.42%	91.49%	3.79%	89.23%	4.57%
	KNNR	83.05%	6.85%	64.10%	12.06%	57.92%	16.80%	90.25%	4.04%	94.79%	3.49%
	MLPR	83.97%	6.89%	69.21%	10.75%	63.54%	15.50%	91.53%	3.86%	95.35%	3.23%
4	LR	85.47%	6.41%	70.00%	10.83%	64.38%	15.02%	91.85%	3.78%	96.50%	2.90%
	PAR	83.69%	6.43%	24.01%	21.35%	60.02%	15.88%	88.93%	4.51%	90.43%	5.09%
	KNNR	79.81%	7.48%	61.44%	12.89%	55.56%	17.48%	83.40%	5.52%	84.03%	6.31%
	MLPR	84.07%	6.86%	69.51%	11.04%	63.33%	15.73%	90.42%	4.05%	91.38%	4.77%

Table 2. Regression, TLPQ values for datasets with 7 possible traffic levels.

-	-	DATASET_1			DATASET_2			DATASET_3			DATASET_4			DATASET_5
Model	Reg	S_1	S_2	S_3	S_1	S_2	S_3	S_1	S_2	S_3	S_1	S_2	S_3	S_1	S_2	S_3
1	LR	92.1	87.5	84.4	83.4	74.1	67.8	79.8	68.6	61.2	95.9	93.4	91.8	97.8	96.5	95.6
	PAR	93.3	88.6	85.5	64.7	52.7	44.7	86.8	70.3	59.3	94.8	92.3	90.7	98.4	97.1	96.1
	KNNR	91.4	86.4	83.1	81.7	71.7	65.1	81.1	69.4	61.6	95.2	92.6	90.8	97.5	96.0	95.0
	MLPR	92.1	87.5	84.4	82.5	73.1	66.8	74.9	63.5	55.9	96.0	93.5	91.9	97.8	96.5	95.6
2	LR	92.2	87.7	84.6	84.2	75.0	68.9	80.7	69.9	62.6	95.9	93.5	91.9	98.3	97.2	96.6
	PAR	86.6	81.6	78.2	90.3	72.1	60.0	86.7	75.3	67.6	98.8	95.1	92.7	98.1	96.7	95.7
	KNNR	91.0	86.0	82.7	81.9	71.5	64.5	78.2	66.2	58.2	94.3	91.4	89.4	97.0	95.4	94.4
	MLPR	89.2	84.6	81.5	85.0	75.7	69.6	81.0	70.1	62.8	93.3	90.6	88.8	98.5	97.2	96.3
3	LR	92.4	87.9	84.8	84.2	75.1	69.0	80.7	69.9	62.7	95.9	93.5	91.9	98.3	97.3	96.6
	PAR	75.2	67.4	62.3	92.0	81.5	74.5	82.6	71.6	64.3	95.2	92.6	90.9	89.3	86.0	83.9
	KNNR	91.1	86.0	82.6	81.4	70.7	63.6	78.0	65.6	57.3	94.4	91.5	89.5	97.0	95.4	94.4
	MLPR	95.6	90.8	87.6	81.6	72.4	66.2	83.4	72.6	65.3	96.7	94.1	92.4	97.7	96.3	95.4
4	LR	93.0	88.6	85.7	85.1	76.2	70.2	81.2	70.6	63.5	96.3	93.9	92.2	98.3	97.2	96.5
	PAR	87.6	82.7	79.4	88.4	68.9	55.9	81.4	69.6	61.7	96.1	92.8	90.6	98.1	95.2	93.3
	KNNR	89.6	83.6	79.5	81.7	70.3	62.6	75.6	62.5	53.8	91.8	86.9	83.5	94.1	89.3	86.1
	MLPR	94.9	90.2	87.0	85.5	76.4	70.3	85.0	74.1	66.8	94.6	91.7	89.8	98.7	96.1	94.3

Table 3. Regression, TLPQ values for datasets with 12 possible traffic levels.

-	-	DATASET_1			DATASET_2			DATASET_3			DATASET_4			DATASET_5
Model	Reg	S_1	S_2	S_3	S_1	S_2	S_3	S_1	S_2	S_3	S_1	S_2	S_3	S_1	S_2	S_3
1	LR	86.1	77.9	72.5	73.3	59.2	49.8	68.6	53.1	42.7	92.9	88.5	85.7	96.4	94.2	92.8
	PAR	83.2	74.9	69.3	40.1	20.9	8.1	72.4	57.3	47.3	69.2	59.9	53.7	96.5	94.3	92.9
	KNNR	84.9	76.1	70.2	71.2	56.2	46.3	66.0	50.1	39.5	92.5	87.9	84.8	96.0	93.6	92.0
	MLPR	86.1	78.0	72.7	72.1	58.2	48.9	68.7	53.1	42.8	92.7	88.4	85.5	96.4	94.2	92.8
2	LR	86.7	78.7	73.4	74.6	60.9	51.8	69.5	54.2	44.0	93.1	89.0	86.3	97.4	96.0	95.1
	PAR	88.0	80.1	74.8	76.8	63.1	54.0	55.7	38.8	27.5	94.6	90.2	87.3	83.0	77.9	74.5
	KNNR	84.5	75.4	69.3	71.1	56.0	46.0	65.2	48.8	37.9	91.3	86.3	82.9	96.5	94.3	92.8
	MLPR	88.6	80.6	75.2	76.3	62.6	53.5	71.1	55.9	45.8	92.5	88.3	85.5	97.0	94.9	93.4
3	LR	86.6	78.7	73.4	74.7	61.0	51.9	69.5	54.2	44.0	93.1	89.1	86.4	97.4	96.0	95.0
	PAR	91.1	82.7	77.1	79.4	63.4	52.7	22.3	−2.5	−19.0	96.3	74.4	59.7	96.7	94.6	93.3
	KNNR	85.2	76.4	70.5	70.9	55.8	45.6	64.1	47.3	36.2	91.4	86.5	83.2	96.4	94.2	92.7
	MLPR	85.6	77.8	72.7	73.9	60.2	51.1	72.4	57.2	47.1	95.3	90.8	87.8	98.0	95.6	94.1
4	LR	87.6	79.9	74.8	75.8	62.3	53.3	69.8	54.4	44.1	93.8	89.7	86.9	97.6	96.2	95.2
	PAR	80.0	71.5	65.8	84.2	69.6	59.8	20.9	−3.2	−19.3	94.5	88.9	85.1	94.8	91.0	88.4
	KNNR	82.8	71.8	64.4	69.5	53.7	43.1	61.1	44.2	32.9	89.2	80.9	75.3	90.6	83.6	78.9
	MLPR	83.1	74.7	69.1	72.1	58.4	49.3	71.0	55.7	45.5	87.0	81.9	78.5	97.0	94.6	93.0

Table 4. Regression, TLPQ values for datasets with 20 possible traffic levels.

-	-	DATASET_1			DATASET_2			DATASET_3			DATASET_4			DATASET_5
Model	Reg	S_1	S_2	S_3	S_1	S_2	S_3	S_1	S_2	S_3	S_1	S_2	S_3	S_1	S_2	S_3
1	LR	76.1	62.9	54.1	54.2	36.9	25.4	47.5	29.7	17.8	86.3	78.0	72.5	93.2	89.1	86.3
	PAR	84.8	69.4	59.1	51.0	33.1	21.2	39.6	29.3	22.4	91.3	82.8	77.1	75.1	67.3	62.2
	KNNR	74.4	60.5	51.2	50.8	33.3	21.6	45.1	28.1	16.8	85.1	76.2	70.2	91.5	86.6	83.3
	MLPR	75.4	62.2	53.3	52.2	34.4	22.5	47.3	29.5	17.5	86.4	78.2	72.7	93.1	88.9	86.1
2	LR	76.5	63.5	54.8	55.9	38.6	27.1	48.5	30.7	18.9	86.7	78.8	73.5	95.3	92.4	90.4
	PAR	78.0	65.0	56.3	28.7	22.0	17.5	40.8	20.6	7.2	91.7	82.7	76.8	77.0	69.9	65.2
	KNNR	73.3	59.0	49.5	50.4	32.9	21.2	43.6	25.5	13.5	84.6	75.5	69.5	92.4	88.1	85.3
	MLPR	81.1	68.0	59.2	58.2	42.6	32.2	47.2	29.2	17.1	88.2	79.7	74.1	89.9	85.7	82.9
3	LR	76.4	63.3	54.6	56.0	38.7	27.2	48.5	30.8	18.9	86.6	78.7	73.5	95.3	92.3	90.4
	PAR	86.9	71.8	61.8	53.6	40.1	31.1	9.7	−17.8	−36.2	53.3	38.9	29.2	86.3	81.4	78.2
	KNNR	74.2	60.2	50.9	50.2	32.8	21.2	42.3	24.7	13.0	84.5	75.4	69.2	92.5	88.2	85.3
	MLPR	71.8	58.7	49.9	56.1	39.0	27.6	49.0	31.7	20.2	84.6	76.4	70.9	88.8	84.5	81.5
4	LR	78.0	65.0	56.4	56.9	39.8	28.4	48.9	31.1	19.3	88.0	80.1	74.7	95.5	92.6	90.6
	PAR	54.9	39.0	28.4	56.4	39.4	28.1	28.7	7.7	−6.3	18.7	−5.7	−22.0	98.2	85.8	77.6
	KNNR	68.1	52.3	41.7	47.4	30.1	18.5	40.6	22.8	10.9	77.4	63.3	53.8	82.2	69.9	61.6
	MLPR	76.3	62.8	53.8	56.7	39.5	28.1	35.1	13.4	−1.1	87.0	77.2	70.6	91.2	86.3	83.0

Table 5. Classification, TLPQ values for datasets with 7 possible traffic levels.

-	-	DATASET_1			DATASET_2			DATASET_3			DATASET_4			DATASET_5
Model	Class	S_1	S_2	S_3	S_1	S_2	S_3	S_1	S_2	S_3	S_1	S_2	S_3	S_1	S_2	S_3
1	LDA	91.6	87.0	84.0	84.0	74.7	68.4	79.8	68.7	61.3	93.7	91.0	89.1	97.0	94.8	93.3
	KNN	91.1	86.0	82.6	79.2	68.4	61.2	78.9	66.7	58.6	94.9	92.1	90.3	97.7	96.2	95.2
	GNB	91.9	87.3	84.3	83.2	73.8	67.6	78.9	67.9	60.5	94.0	91.3	89.5	97.2	95.3	94.0
	DT	90.1	84.9	81.5	81.9	72.2	65.8	80.0	68.8	61.4	93.0	89.7	87.4	96.8	94.9	93.7
2	LDA	91.8	87.2	84.2	84.2	74.7	68.5	80.4	69.4	62.1	93.8	91.1	89.3	97.5	95.6	94.4
	KNN	90.5	85.1	81.5	80.3	69.5	62.2	74.6	61.7	53.1	94.0	90.9	88.8	96.8	95.1	94.0
	GNB	89.9	82.9	78.2	78.1	65.5	57.2	72.5	58.4	49.0	89.7	83.5	79.4	92.0	86.9	83.5
	DT	89.2	82.8	78.6	78.4	66.0	57.7	72.9	58.3	48.7	94.3	91.1	88.9	96.4	94.5	93.3
3	LDA	92.3	87.7	84.7	84.9	75.5	69.2	80.6	69.6	62.3	93.1	90.1	88.2	97.4	95.1	93.6
	KNN	90.4	85.1	81.5	79.5	68.3	60.8	73.9	60.5	51.6	94.1	91.1	89.1	96.8	95.1	94.0
	GNB	90.1	83.1	78.5	78.0	65.4	57.0	72.5	58.3	48.9	90.1	84.1	80.1	92.0	87.0	83.7
	DT	88.6	82.1	77.7	77.6	65.3	57.1	72.6	58.1	48.5	94.1	90.7	88.3	96.5	94.4	93.0
4	LDA	92.4	87.4	84.0	85.1	75.4	68.9	80.6	69.5	62.1	93.5	90.3	88.1	96.2	92.9	90.7
	KNN	89.2	83.0	78.8	79.0	66.8	58.6	70.5	56.1	46.6	91.7	86.6	83.2	94.1	89.2	85.9
	GNB	90.4	83.6	79.1	78.1	65.1	56.5	72.8	58.1	48.2	90.8	85.3	81.7	92.7	87.4	83.9
	DT	88.2	81.3	76.7	78.8	66.3	58.0	74.2	60.0	50.6	92.7	88.3	85.3	95.1	92.4	90.5

Table 6. Classification, TLPQ values for datasets with 12 possible traffic levels.

-	-	DATASET_1			DATASET_2			DATASET_3			DATASET_4			DATASET_5
Model	Class	S_1	S_2	S_3	S_1	S_2	S_3	S_1	S_2	S_3	S_1	S_2	S_3	S_1	S_2	S_3
1	LDA	86.2	78.0	72.5	73.6	59.6	50.3	68.3	52.6	42.1	92.7	88.2	85.3	96.0	93.7	92.1
	KNN	84.7	75.7	69.7	64.3	48.3	37.7	60.6	43.8	32.5	92.5	87.8	84.7	96.3	93.8	92.1
	GNB	85.8	77.7	72.2	72.4	58.3	48.9	66.2	50.5	40.1	92.6	88.3	85.4	95.9	93.6	92.0
	DT	82.9	73.8	67.6	71.0	56.7	47.1	67.2	51.5	41.1	89.1	83.4	79.7	94.7	91.5	89.4
2	LDA	86.3	78.1	72.7	74.6	60.8	51.6	68.9	53.3	42.9	92.9	88.6	85.7	97.0	95.3	94.2
	KNN	82.8	73.2	66.8	64.4	48.5	37.9	58.0	40.5	28.9	91.1	85.9	82.5	96.4	94.0	92.5
	GNB	79.9	68.5	60.9	65.1	49.6	39.3	57.6	40.3	28.7	85.1	76.3	70.4	88.8	81.9	77.3
	DT	79.7	68.4	60.9	64.2	47.9	37.1	57.1	40.1	28.8	90.3	84.6	80.7	95.6	93.2	91.6
3	LDA	86.4	78.1	72.6	74.6	60.8	51.5	69.1	53.4	43.0	92.4	87.9	84.9	95.4	92.9	91.3
	KNN	83.8	74.4	68.1	63.8	47.5	36.7	56.6	38.9	27.1	90.9	85.9	82.5	96.3	94.0	92.4
	GNB	80.4	69.2	61.6	65.3	49.8	39.5	57.4	40.3	28.8	85.4	76.7	71.0	88.6	81.6	77.0
	DT	80.3	69.1	61.7	65.2	49.0	38.3	57.2	40.1	28.8	90.5	84.6	80.7	96.2	93.8	92.3
4	LDA	86.4	77.6	71.7	74.7	60.5	51.0	67.6	51.7	41.0	91.8	85.9	82.0	93.3	89.4	86.8
	KNN	81.0	69.5	61.8	63.0	46.0	34.7	53.3	35.1	23.0	89.0	80.4	74.6	90.3	83.2	78.4
	GNB	83.0	72.2	65.0	66.0	50.6	40.4	57.3	40.4	29.1	87.5	79.0	73.4	88.0	81.0	76.4
	DT	79.0	66.8	58.7	63.2	47.2	36.5	57.5	40.7	29.5	88.0	80.5	75.6	94.8	91.5	89.3

Table 7. Classification, TLPQ values for datasets with 20 possible traffic levels.

-	-	DATASET_1			DATASET_2			DATASET_3			DATASET_4			DATASET_5
Model	Class	S_1	S_2	S_3	S_1	S_2	S_3	S_1	S_2	S_3	S_1	S_2	S_3	S_1	S_2	S_3
1	LDA	76.2	62.9	54.0	53.9	36.8	25.3	46.9	28.9	16.8	86.3	78.0	72.5	92.2	87.8	84.9
	KNN	69.6	54.8	45.0	44.0	25.2	12.7	37.9	19.0	6.4	83.7	74.3	68.1	91.4	86.4	83.0
	GNB	75.7	62.4	53.6	52.7	34.9	23.1	45.8	27.2	14.9	86.1	77.8	72.3	92.1	87.7	84.7
	DT	70.1	55.5	45.9	50.3	32.6	20.8	45.4	27.3	15.2	78.0	67.3	60.1	89.1	83.1	79.1
2	LDA	76.3	63.1	54.2	54.5	37.2	25.7	46.9	29.2	17.4	86.6	78.6	73.3	93.8	90.6	88.5
	KNN	66.8	51.6	41.5	42.0	23.2	10.7	35.6	16.2	3.3	82.8	73.1	66.5	91.6	87.0	83.9
	GNB	66.7	51.5	41.3	43.7	26.5	15.0	37.5	19.7	7.8	74.6	61.4	52.6	81.4	71.0	64.0
	DT	66.6	50.8	40.2	42.1	24.8	13.3	34.6	17.4	5.9	81.3	70.3	63.0	91.4	86.8	83.7
3	LDA	76.2	62.8	54.0	54.2	37.0	25.5	46.8	29.3	17.7	86.0	77.6	72.1	92.7	89.0	86.6
	KNN	67.3	52.0	41.8	42.3	23.4	10.8	34.0	14.7	1.8	82.7	72.9	66.3	91.5	86.9	83.9
	GNB	67.3	52.2	42.1	43.5	26.1	14.6	37.0	19.1	7.1	74.8	61.8	53.2	81.2	70.7	63.8
	DT	67.1	51.1	40.5	42.3	25.0	13.5	36.1	19.0	7.6	81.6	70.8	63.6	91.3	86.4	83.2
4	LDA	75.5	61.6	52.2	53.6	36.7	25.4	45.1	27.7	16.0	84.3	73.7	66.6	92.0	86.3	82.5
	KNN	62.7	46.2	35.2	39.2	20.5	8.0	32.5	12.9	−0.1	75.5	60.8	51.0	80.8	68.4	60.1
	GNB	69.4	54.3	44.1	44.0	26.9	15.6	35.6	18.2	6.5	77.1	64.0	55.3	82.5	71.8	64.7
	DT	68.1	52.5	42.0	43.1	25.9	14.4	35.9	18.7	7.2	78.8	66.2	57.8	88.2	81.2	76.6

Table 8. Difference of TLPQ value between regression and classification approaches.

-	DATASET_1	DATASET_2	DATASET_3	DATASET_4	DATASET_5
-	Scenario 1 (S_1)
7	3.2	6.9	6.1	3.9	1.0
12	4.7	9.5	3.4	3.4	1.0
20	1.6	3.8	2.1	5.1	4.4
-	Scenario 2 (S_2)
7	3.0	6.0	5.6	3.0	1.1
12	4.6	8.8	3.9	2.2	0.9
20	8.8	5.4	2.4	4.2	2.0
-	Scenario 3 (S_3)
7	2.9	5.4	5.3	2.4	1.5
12	4.5	8.2	4.3	2.2	1.0
20	7.5	6.5	4.7	3.8	2.1

Table 9. The best regressors for datasets based on different measures.

Levels	Measure	DATASET_1	DATASET_2	DATASET_3	DATASET_4	DATASET_5
7	TLPQ	MLPR 3	PAR 3	PAR 1/PAR3	PAR 2	MLPR 4/LR 3
7	Accuracy	LR 4	LR 4	LR 4	LR 3	LR 3
12	TLPQ	PAR 3	PAR 4	MLPR 3/PAR 1	PAR 3/MLPR 3	MLPR 3/LR 4
12	Accuracy	LR 4	LR 4	LR 3	LR 3	LR 2
20	TLPQ	PAR 3	MLPR 2	MLPR 3/PAR 1	PAR 2/PAR 1	PAR 4/LR 4
20	Accuracy	LR 4	LR 4	LR 4	LR 2	LR 2

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Szostak, D.; Włodarczyk, A.; Walkowiak, K. Machine Learning Classification and Regression Approaches for Optical Network Traffic Prediction. Electronics 2021, 10, 1578. https://doi.org/10.3390/electronics10131578

AMA Style

Szostak D, Włodarczyk A, Walkowiak K. Machine Learning Classification and Regression Approaches for Optical Network Traffic Prediction. Electronics. 2021; 10(13):1578. https://doi.org/10.3390/electronics10131578

Chicago/Turabian Style

Szostak, Daniel, Adam Włodarczyk, and Krzysztof Walkowiak. 2021. "Machine Learning Classification and Regression Approaches for Optical Network Traffic Prediction" Electronics 10, no. 13: 1578. https://doi.org/10.3390/electronics10131578

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Machine Learning Classification and Regression Approaches for Optical Network Traffic Prediction

Abstract

1. Introduction

2. Related Works

3. Datasets

3.1. Traffic Generator

3.2. Datasets

4. Proposed Models and Algorithms

4.1. Network Model

4.2. ML Approach

4.3. ML Models

4.4. Quality Metric

5. Numerical Results

5.1. Accuracy and Root Mean Square Percentage Error

5.2. TLPQ Metric

5.3. Classification vs. Regression

5.4. Algorithms Performance Summary

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI