**2. Materials and Methods**

### *2.1. Dataset Description*

The MQTT (Message Queuing Telemetry Transport) protocol works at the application level of the TCP (Transmission Control Protocol). This environment is one of the most used in IoT systems [3]. It is based on a star architecture, which pivots on a central broker that manages the network messages. The message procedure follows a publication/subscription approach, where the messages are characterised as a string implementing a nested structure.

To generate the dataset, a server with an Aedes library acted as broker. An ESP 8266 device was in charge of establishing a connection with the several sensors and actuators.

**Citation:** Michelena, A.; Zayas-Gato, F.; Jove, E.; Calvo-Rolle, J.L. Detection of DoS Attacks in an IoT Environment with MQTT Protocol Based on Intelligent Binary Classifiers. *Eng. Proc.* **2021**, *7*, 16. https://doi.org/10.3390/engproc 2021007016


Academic Editors: Joaquim de Moura, Marco A. González, Javier Pereira and Manuel G. Penedo

Published: 9 October 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

However, the broker was vulnerable to DoS attacks through port MQTT 1883. An MQTTmalaria program was in charge of performing these operations.

The traffic registered during the experiments contained a total number of 94624 samples, with 65 variables containing network information and a label indicating whether the instance is "normal" or "attack". After an initial analysis of the original dataset, the repeated samples were removed, and the constant variables deleted. Furthermore, the categorical variables were transformed following a natural coding criteria. Finally, the data presented 39 variables, 49910 normal instances, and 9429 attacks.

### *2.2. Used Techniques*

#### 2.2.1. Principal Component Analysis

This dimension reduction technique aims to find the directions of higher variability in a dataset, known as principal components [4]. This is performed through the calculation of the eigenvalues of the correlation matrix. Then, using the eigenvectors, the initial set can be linearly transformed into lower dimension space.

#### 2.2.2. Classification Techniques

#### Logistic Regression

The Logistic Regression (LR) classification technique makes use of a sigmoid function to calculate the class membership probability, whose values are fitted following a gradient descent criteria [5].

#### K Nearest Neighbours

This classification method uses the data density to label a new instance. To estimate the class membership, it evaluates the K Nearest Neighbours (KNN) and counts the number of samples of each class [5].

#### Decision Trees

A Decision Tree (DT) algorithm is implemented by repeatedly splitting the dataset using a criteria that maximises the sample separation. At each split, the entropy decrease should be maximised due to the own split [5].

#### Deep Neural Networks

The Deep Neural Networks (DNN) are based on an architecture made of multiple layers, whose neurons are connected with the neurons of adjacent layers. The weight of each connection, and the parameters of activation functions are tuned during the training process following a minimising error criteria [5].

#### **3. Experiments and Results**

#### *3.1. Experimental Setup*

Different experiments were carried out to obtain the best classifier. First, with the aim of minimising the computation times and improve the classifier performance, a dimension reduction was carried out using PCA. In this case, two types of reduction were considered: two components and five components. A 10-fold cross-validation was developed, measuring the accuracy, F1 score, precision, recall, specificity, and the Area Under the Receiving Operating Curve (AUC) [6]. This last measure is the one selected to determine the best classifier, because it is nonsensitive to class distribution.

#### *3.2. Results*

First, an initial analysis of the PCA result was conducted. From the results achieved in Figure 1, the number of components selected were two and five. With this configuration, the four classification techniques were tested, leading to the final results shown in Figure 2.

**Figure 1.** Result of PCA.

**Figure 2.** Boxplot representing AUC results for 2 and 5 components.

#### **4. Conclusions**

The present papers deals with the detection of DoS attack by means of intelligent classifiers. LR classifiers do not achieve as good a performance as the rest of the techniques. Furthermore, using two and five components does not affect significantly the classifiers performance. The implementation of this approach could entail significant benefits for IoT environments with MQTT protocols.

**Acknowledgments:** CITIC, as a Research Center of the University System of Galicia, is funded by Consellería de Educación, Universidade e Formación Profesional of the Xunta de Galicia through the European Regional Development Fund (ERDF) and the Secretaría Xeral de Universidades (Ref. ED431G 2019/01).

