5.2. Multilayer Perceptron Model (MLP)
The proposed MLP model is a supervised machine learning model that, by training on a training data set, learns the function that performs mapping Rm Ro. In this case, the dimension of the input space is m = 17, and the output o = 2.
According to the algorithm given in
Section 3, the first step in creating an MLP model in the IBM SPSS technology involves preprocessing the data, i.e., rescaling or reducing all input values to the same range. In this study, a standardization method was used according to which the variables are rescaled binary so that they have an arithmetic mean equal to 0 and a standard deviation equal to 1.
In order to select the best final model, a total of 30 variants of the MLP model were tested for prediction of the observed dependent variables with different combinations of ratios of the set sizes for training and testing (30%:70%, 40%:60%, 50%:50%, 60%:40%, 70%:30%), and different values of the following metaparameters:
(1) Number of neurons in hidden layers. The architecture of all tested model variants consists of one input layer with 17 neurons, one output layer with two neurons and two hidden layers between them, whose size varies in combinations and can be: 10,10; 20,20; 30,30.
(2)
The shape of the output function of neurons in hidden layers. The IBM SPSS technology offers the possibility to select one of the two functions, which were examined in combinations with the first metaparameter of the model, namely hyperbolic tangent (
Hyp. Tg.) and sigmoid function (
Sig.). Each of the 30 variants of the MLP model with the above combinations were tested by multiple passes through the multilayer network architecture, i.e., triple consecutive trainings and tests, and the results are presented by the relative testing errors in
Table 7 for both dependent variables. Based on the minimum value of the arithmetic mean of this relative accuracy criterion for the three measurements, the final model was selected.
According to [
48], machine learning models are most often trained with 70% of the total data set, while the remaining 30% of the set is reserved for testing, i.e., to assess the accuracy and quality of the model. Additionally, in [
49] the division of data in the mentioned ratio showed a good influence on the accuracy of the classifier. The random forest model presented in [
50] showed the lowest mean standard error when dividing the data in the ratio of 70%:30%.
The selection of this ratio when creating predictive models is one of the inevitable steps that affects its performance. Based on the results of multiple tests of 30 different model variants in
Table 7, it is concluded that the best performance is shown by the MLP model variant in which the division of the total data set of 64,301 input/output vector was performed in the ratio 70%:30%.
The average RE of the selected model for the output variable Cell Downlink Average Throughput is RE = 0.104 which means that the degree of accuracy is equal to 89.6%. For the variable Average User Downlink Throughput, the average relative error has the value RE = 0.120, i.e., the degree of accuracy is 88%.
Table 7 also shows that each hidden layer of the architecture of the selected model consists of 20 neurons, whose output function has the shape of the hyperbolic tangent
tanh x (
Figure 5). The output function of the two neurons in the output layer for the selected, but also for all tested variants of the MLP model, is linear or
Identity.
Table 8 shows the variants of the tested MLP models that have the lowest average values of testing error by individual ratios of the training data set and the testing data set. This table is a summary of
Table 7. Based on
Table 8, it can be concluded that as the percentage of training data increases, the average relative prediction error decreases, e.g., it is obvious that for a ratio of 50%:50% with Hyp.Tg. and with 20 neurons in each hidden layer, the model has an average RE = 0.115, so the degree of accuracy is 88.5%.
If this value is compared with RE = 0.104, i.e., with a degree of accuracy of 89.6%, as much as a model has with a ratio of 70%:30%, it is concluded that by reducing the ratio to 50%:50%, we lose only 1.1% of the performance accuracy of the model. This means that the model can perform prediction quite well even at a ratio of 50%:50%.
According to
Table 7, the lowest individual values of relative testing errors among multiple iterations for the selected MLP model are as follows:
Figure 6 shows the two-layer architecture of the MLP model which achieved the above performance, where the synaptic connections, which are represented by dark blue, have weight factors less than zero, while the lighter color represents positive values of the weight factors.
The metaparameters of multilayer perceptron training do not change during the testing of different models. It is specified that the adjustment of the weighting factors is performed after processing each individual input/output vector from the training set (individual or online learning). The training algorithm is backpropagation that implements the gradient descent method. The Initial Learning Rate has a great influence on the convergence of the algorithm and its value is usually chosen from the range of 10−6 to 1. A higher learning rate accelerates the training process of the MLP model, but negatively affects its prediction performance. The initial value of the learning rate gradually decreases in each epoch, which represents one passage of a complete set of training data through the training algorithm, to a certain lower limit. The IBM SPSS technology specifies a default value of the initial learning rate of 0.4 while its lower limit is 0.001. The maximum number of epochs is determined automatically.
The following part of this section provides detailed training and testing results for the selected final MLP model with the lowest individual RE values for the two dependent variables (RE = 0.098 for the
Cell Downlink Average Throughput variable and RE = 0.115 for the
Average User Downlink Throughput variable). Based on the results shown in
Table 9, generated by IBM SPSS technology, it can be seen that the relative error, i.e., the degree of accuracy of the model over the training data, for the variable
Cell Downlink Average Throughput is 89.8% (RE = 0.102) and is slightly lower in relation to the degree of accuracy for the test set (the degree of accuracy is 90.2%, RE = 0.098), while for the variable
Average User Downlink Throughput, the degree of accuracy over the training data is 88.6% (RE = 0.114), and for testing 88.5% (RE = 0.115). During the training process, the goal is to minimize the
error function which, in the observed case, is represented by the
Sum of Squares Error (SSE), and is calculated according to the equation:
where
RV (
Real Value) represents the actual value of the observed dependent variable,
PV (
Predicted Value) represents the prediction value of the observed dependent variable, and
N is the number of all measurements. In the initial state, when the model is not yet trained, the value of the sum of the square error is significantly higher and is calculated by using the values of the arithmetic mean of that dependent variable as the prediction values of the observed dependent variable for all measurements. According to
Table 9, the error function after model training has a value of 4882.801 and when testing the model, its value is smaller and amounts to 2089.438.
Two criteria for stopping the training process were set: (1) expiration of a time period of 15 min and (2) reaching the number of five consecutive iterations without reducing the error function. By meeting at least one of the above criteria, the training process is interrupted, and in the specific case of the selected MLP model, condition 2 is met, as can be seen in
Table 9.
Figure 7 graphically shows the best individual result from the set of 30 variants of the MLP model testing, in the form of
scatter plots, i.e., prediction results for the two observed dependent variables. The abscissa shows the values from the test set, while the ordinate indicates the prediction values. Based on the obtained coordinates of the points and their arrangement, it is possible to visually evaluate the accuracy of the model. This way of presenting the prediction results is suitable for determining the coefficient of determination (R
2) which describes the percentage of variance of the target variable, which is explained by the model. For the dependent variable
Cell Downlink Average Throughput, the best individual result of training and testing gives the value of the coefficient of determination R
2 = 0.899 while for the
Average User Downlink Throughput, this value is R
2 = 0.885. The coefficient of determination is higher if the points are grouped around a line defined by the equation
y = x, which is confirmed in
Figure 7. If the R
2 values obtained in this way are compared with those calculated for the linear models, it can be concluded that MLP has a significantly higher accuracy of performance prediction or model quality because
The
Chaddock scale, presented in
Table 10 [
51], was used to qualitatively evaluate the coefficient of determination R
2. It can be seen that the accuracy performance of the MLP model is in the range of 0.7–0.9 and qualifies as high. On the other hand, the accuracy of prediction of linear models based on the coefficients of determination can be assessed as
salient. Comparing the prediction accuracy of the MLP model with the linear model, it is obvious that it has higher, i.e., high prediction accuracy, but it is also characterized by incomparably higher complexity compared to the linear model.
As a quality parameter of the created predictive model, used
Residual (
r) is used which represents the difference between the real value (
RV) and the prediction value (
PV) of the observed variable:
. The arrangement of the residuals for the two output variables is shown in
Figure 8. The model is of better quality and more accurate if it has residuals grouped around the horizontal zero line, which, in this case, is visually clearly visible.
One of the graphical results, generated by IBM SPSS technology, which represents the ranking of independent variables according to the
importance of influencing the variability of the dependent variable in the model is shown in
Figure 9. The measure showing how much the value created by the prediction changes for different values of the independent variable represents the
importance of the independent variable expressed numerically at the lower position of the abscissa of the graph, while the
normalized importance, obtained by dividing the importance with the largest individual value, is expressed as a percentage and presented at the upper position of the abscissa of the graph in
Figure 9.
The interpretability of the model, as its ability to present the results to the researcher in an understandable degree, can be seen in
Figure 9 and indicates that the variable
DL Cell Traffic Volume has the greatest
importance of influencing the changes in the values of the dependent variables. This conclusion is logical and expected considering that the throughput in the network, by definition, directly depends on the total realized traffic or the amount of data transmitted during the observed time. In contrast, the
Average CQI variable is at the bottom of the rankings and has the least importance of influencing the changes in the value of the dependent variable. This is explained by the fact that MLP well models the nonlinear dependences of output on input. The observed variable is linearly related to the corresponding modulation and coding scheme of communication, and it is important for the size of the transport block when transmitting data through a channel which directly influences the throughput of specific classes of traffic.
Based on the fact that the creation of the model in this paper requires previously collected research data related to the network traffic and its parameters, it is concluded that, in this case a statistical approach is applied to the traffic modeling, which is otherwise based on
traces (recording of combined traffic). Another possible approach is through an
emulator, and, in some cases, a
hybrid approach is possible. Since the variable DL Cell Traffic Volume means that it is the total traffic in the DL direction at the cell level, the created predictive model is a
model of combined traffic. When models are created for each service, i.e., for each of the nine traffic classes shown in
Table 5, this means that an approach based on
models of information sources is applied, which are characterized by greater accuracy, but also greater complexity than the previous one. According to [
22], the models of combined traffic are, in most cases, more suitable for application in the prediction of traffic in networks, and in [
23] a good overview of the traffic models is presented.
The analysis of possible differences in the results of training and testing of different variants of the model in the three iterations shown was carried out by appropriate statistical tests. If the three measurements of the RE values are considered as three statistical groups, each tested variant of the MLP model represents one experimental unit over which the experiment is repeated three times. A parametric technique for testing statistical hypotheses about the equality of arithmetic means for three or more groups when it comes to repeated measurements is the analysis of variance or ANOVA of the repeated measurements. One of the basic preconditions for its application, in this case, is the normality of the distribution of the RE by the observed groups, i.e., the repeated measurements.
Table 11 shows the results of the
Kolmogorov–Smirnov and
Shapiro–Wilk normality tests, based on which it can be concluded that the distribution of the RE deviates from the normal distribution in all three groups for the variable
Cell Downlink Average Throughput (Sig. < 0.05) [
52], while only the first and second groups of the repeated relative error measurements for the
Average User Downlink Throughput variable follow the normal distribution (Sig. > 0.05).
Thus, ANOVA of the repeated measurements in this case cannot be used to determine potential statistical equations or differences between the mentioned groups. An appropriate nonparametric technique applied in case of nonfulfillment of the prerequisites for the analysis of variance is the
Friedman test conducted in the IBM SPSS technology and whose results are shown in
Table 12 for both dependent variables.
Based on the value of Asimp. Sig. (0.384 > 0.05 and 0.530 > 0.05), it can be concluded that there are no statistically significant differences in the relative errors, i.e., the prediction accuracy performance by the individual training and testing iterations of the 30 observed MLP models.