**1. Introduction**

The IoT consists of myriad smart devices capable of data collection, storage, processing, and communication. The adoption of the IoT has brought about tremendous innovation opportunities in industries, homes, the environment, and businesses, and it has enhanced the quality of life, productivity, and profitability. However, infrastructures, applications, and services associated with the IoT introduced several threats and vulnerabilities as emerging protocols and workflows exponentially increased attack surfaces [1]. For instance, the outbreak of the Mirai botnet exploited IoT vulnerabilities and crippled several websites and domain name systems [2].

It is challenging to secure IoT devices as they are heterogeneous, traditional security controls are not practical for these resource-constrained devices, and the distributed IoT networks fall out of the scope of perimeter security, and existing solutions such as the cloud suffer from centralisation and high delay. Another reason for this challenge is that IoT device vendors commonly overlook security requirements due to a rush-to-market mentality. Furthermore, the lack of security standards has added another dimension to the complexity of securing IoT devices. These challenges and the nature of IoT applications call for a monitoring system such as anomaly detection at device and network levels beyond the organisational boundary.

An anomaly is a pattern or sequence of patterns in IoT networks or data that significantly deviate from the normal behaviour. Anomalies can be contextual and collective

**Citation:** Diro, A.; Chilamkurti, N.; Nguyen, V.-D.; Heyne, W. A Comprehensive Study of Anomaly Detection Schemes in IoT Networks Using Machine Learning Algorithms. *Sensors* **2021**, *21*, 8320. https:// doi.org/10.3390/s21248320

Academic Editors: Zihuai Lin and Wei Xiang

Received: 8 November 2021 Accepted: 8 December 2021 Published: 13 December 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

points based on the sources of anomalies [3]. Point anomaly represents a specific data point that falls outside the norm, and it indicates random irregularity, extremum, or deviation with no meaning, often known as outliers. The contextual anomaly denotes a data point that deviates from the norm in a specific context such as in a time window. It means that the same normal observation in a given context can be abnormal in a different context. The contextual anomaly is driven by contextual features such as time and space and behavioural features such as the application domain. A collection of related data points, specifically in sequential, spatial, and graph data, that fall outside of normal behaviour forms collective anomalies. It is denoted as a group of interconnected, correlated, or sequential instances, where individuals of the group are not anomalous themselves; the collective sequence is anomalous. Anomalous events rarely occur; however, these events bring about dramatic negative impacts in businesses and governments using IoT applications [4].

As for protecting IoT and I.T. applications, intrusion detection systems (I.D.S.s) that alert abnormal events or suspicious activities that might lead to an attack have been developed. I.D.S.s can be divided into two main categories: anomaly-based and signaturebased. With anomaly-based I.D.S.s, unidentified attacks or zero-day attacks can be detected as deviations from normal activities [5]. However, signature-based I.D.S cannot identify unknown attacks until the vendors release updated versions consisting of the new attack signatures [5]. This indicates that anomaly-based I.D.S.s are strongly positioned to secure IoT devices better than signature-based I.D.S.s. Moreover, there is a large amount of raw data generated by IoT devices, which leads to the process of identifying suspicious behaviour from data suffering from high computation cost due to included noise. Hence, lightweight distributed anomaly-based I.D.S.s play a significant role in thwarting cyberattacks in the IoT network.

In recent years, using machine learning techniques to develop anomaly-based I.D.S.s to protect the IoT system has produced encouraging results as machine learning models are trained on normal and abnormal data and then used to detect anomalies [1,2]. However, building effective and efficient anomaly detection modules is a challenging task as machine learning has the following drawbacks:


However, with the advancement in hardware such as GPU and neural networks such as deep learning, machine learning has constantly improved. This makes it promising for anomaly detection emerging platforms such as blockchain.

This paper aims to provide an in-depth review of current works in developing anomaly detection solutions using machine learning to protect an IoT system, which can help researchers and developers design and implement new anomaly-based I.D.S.s. Our contributions are summarised as follows: first, we present the significance of anomaly detection in the IoT system (Section 2); then, we identify the challenges of applying anomaly detection to an IoT system (Section 3); after that, we describe the state-of-the-art machine learning techniques for detecting anomalies in the system (Section 4); finally, we analyse the use of machine learning techniques for IoT anomaly detection (Section 5). In particular, this paper also covers the federated learning technique that helps to collaboratively train effective machine learning models to detect anomalies (Section 4) and indicates that the use of blockchain for anomaly detection is a novel contribution as the inherent characteristics of a distributed ledger is an ideal solution to defeat adversarial learning systems (Section 5).

#### **2. Significance of Anomaly Detection in the IoT**

Over the years, anomaly-based I.D.S.s have been applied in a wide range of IoT applications, as illustrated in Table 1. This section will focus on the important roles of anomaly detection systems in industries, smart grids, and smart cities.


**Table 1.** Anomaly-Based I.D.S.s according to Anomaly Types and Applications.

Industrial IoT is one of the beneficiaries of anomaly detection tools. Anomaly detection has been leveraged for industrial IoT applications such as power systems, health monitoring [28], heating ventilation and air conditioning system fault detection [29], production plant maintenance scheduling [30], and manufacturing quality control systems [31]. In [32], machine learning approaches such as linear regression have been applied to sensor readings of engine-based machines to learn deviations from normal system behaviours. The study demonstrated that anomaly detection plays a significant role in preventive maintenance by detecting machine failures and inefficiencies. In another study, autoencoder (A.E.)-based outlier detection was investigated in audio data using reconstruction error [33]. The study showed that early detection of anomalies could be used as responsive maintenance for machine failures, thereby reducing downtime. Furthermore, water facilities used IoT anomaly detection [34] to monitor and identify certain chemical concentration levels as a reactive alerting mechanism. These studies show that IoT anomaly detection provides mechanisms of improving efficiency and system up-time for industry machines by monitoring machine health.

The power sector including existing smart grids has also attracted anomaly detection systems to identify power faults and outages. The study in [35] utilised statistical methods to develop an anomaly detection framework using smart meter data. The authors argue that hierarchical network data can be used to model anomaly detection for power systems. The other study [36] employed high-frequency signals to detect anomalies in power network faults. The article concludes that local anomaly detection depends more on network size than topology. In [37], big data analysis schemes were explored to detect and localise failures and faults in power systems. The study showed that the compensation theorem in circuit theory could be applied to event detection in power networks. Physical attacks on smart grids such as energy theft can also be detected by using anomaly detection systems, as shown in [38]. It is compelling that anomaly detection plays a paramount role in detecting failures and faults in power systems, enhancing system reliability and efficiency.

Abnormality detection can be used for smart city facilities such as roads and buildings. Road surface anomalies were studied in [39]. It has been indicated that damage to private vehicles can be reduced if the road surface is monitored for anomalies so that timely measures such as maintenance are taken before road incidents. In the study undertaken in [40], pollution monitoring and controlling were modelled as an anomaly to enable policymaker decisions in health, traffic, and environment. Similarly, assisted living can also benefit from IoT-based anomaly detection as deviations from normal alert caregivers as studied in [41]. Thus, it can be summed up that abnormal situations in smart cities and buildings can be detected using anomaly detection systems, and these can be provided to policymakers for decision-making purposes.

#### **3. Challenges in IoT Anomaly Detection Using Machine Learning**

The development of anomaly detection schemes in the IoT environment is challenging due to several factors such as (1) scarcity of IoT resources; (2) profiling normal behaviours; (3) the dimensionality of data; (4) context information; and (5) the lack of resilient machine learning models [15]. These factors will be explained in this section.

#### *3.1. Scarcity of IoT Resources*

The leverage of device-level IoT anomaly detection can be hindered by the constraints in storage, processing, communication, and power resources. To compensate for this, the cloud can be adopted as a data collection, storage, and processing platform. However, the remoteness of the cloud can introduce high latency due to resource scheduling and round trip time. This delay may not be acceptable for real-time requirements of IoT suspicious events [15]. It is also evident that the scale of traffic in the IoT may degrade the detection performance of the anomaly detection system if it exceeds the capacity of the devices. A better solution is to offload certain storage and computations from devices to edge nodes or to send aggregated data to the cloud. Sliding window techniques can also offer reduced storage benefits by withholding only certain data points, though the anomaly detection system may require patterns/trends [26].

#### *3.2. Profiling Normal Behaviours*

The success of an anomaly detection system depends on gathering sufficient data about normal behaviours; however, defining normal activities is challenging. Due to their rare occurrence, anomalous behaviours might be collected within normal behaviours. There is a lack of datasets representing both IoT normal and abnormal data, making supervised learning impractical, specifically for massively deployed IoT devices. This drives the need to model IoT anomaly detection systems in unsupervised or semi-supervised schemes, where data deviating from those collected in normal operations are taken as anomalous [3].

#### *3.3. Dimensionality of Data*

IoT data can be univariate as key-value *xt* or multivariate as temporally correlated univariate *xt* = - *xt* 1 ,..., *xt n*. The IoT anomaly detection using univariate series compares current data against historical time series. In contrast, multivariate-based detection provides historical stream relationships and relationships among attributes at a given time. Thus, choosing a specific anomaly detection mechanism in IoT applications depends on data dimensionality due to associated overheads in processing [3,29]. Furthermore, multivariate data introduces the complexity of processing for models, which needs dimension reduction techniques using principal components analysis (P.C.A.) and A.E.s. On the other hand, univariate data may not represent finding patterns and correlations that enhance machine learning performance.
