*3.4. Context Information*

The distributed nature of IoT devices caters to context information for anomaly detection. However, the challenge is to capture the temporal input at a time *t*1 is related to input at a time *tn* and spatial contexts in large IoT deployments where some IoT devices are mobile in their operations. This means that introducing context enriches anomaly detection systems, but increases complexity if the right context is not captured [3].

#### *3.5. Lack of Machine Learning Models Resiliency against Adversarial Attacks*

The lack of a low false-positive rate of existing machine learning models and the vulnerability to adversarial attacks during training and detection call for both accurate algorithms and resilient models. On the other hand, the massive deployment of IoT devices could be leveraged for collective anomaly detection as most of the devices in the network exhibit similar characteristics. This large number of devices helps to utilise the power of cooperation against cyber-attacks such as malware [42]. Model poisoning and evasion can decrease the utility of machine learning models as adversaries can introduce fake data to train or tamper the model.

#### **4. Machine Learning Techniques for Detecting Anomalies in the IoT**

Several aspects of IoT anomaly detection using machine learning must be considered. Learning algorithm methods can be categorised into three groups: supervised, unsupervised, and semi-supervised. The technique to train the learning algorithms across many decentralised IoT devices is known as federated learning. In addition, anomaly detection can be seen in terms of extant data dimension, leading to univariate-and multivariate-based approaches. In the rest of this section, we will present the anomaly detection schemes based on (1) machine learning algorithms; (2) federated learning; and (3) data sources and dimensions.

#### *4.1. Detection Schemes Based on Machine Learning Algorithms*

Supervised algorithms, known as discriminative algorithms, are classification-based learning through labelled instances. These algorithms consist of classification algorithms such as the K-nearest neighbour (K.N.N.), support vector machine (SVM), Bayesian network, and neural network (N.N.) [43,44]. K.N.N. is one of the distance-based algorithms of anomaly detection where the distances of anomalous points from the majority of the dataset are greater than a specific threshold. Calculating the distances is computationally complex; it seems impossible to provide on-device anomaly detection using this algorithm. On the other hand, SVM provides a hyperplane that divides data points for classification. As in the case of K.N.N., it is so resource-intensive that the applicability to IoT anomaly detection is impractical. As the Bayesian network may not require the prior knowledge of neighbour nodes for anomaly detection, it can be adopted for resource-constrained devices through low accuracy. Finally, N.N. algorithms have been extensively used to train on normal data so that anomalous data can be detected as the deviation from normal. The resource requirements of N.N. algorithms make it challenging to adapt to the IoT environment. Hence, supervised algorithms are the least applicable for IoT anomaly detection systems for their labelled dataset requirements and extensive resource requirements.

Commonly known as generative algorithms, unsupervised algorithms use unlabelled data to learn hierarchical features. Clustering-based algorithms such as K-means and density-based spatial clustering of applications with noise (D.B.S.C.A.N.) are unsupervised techniques that apply similarity and density attributes to classify data points into clusters [43,44]. Abnormal points are small data points significantly far from the dense area, while normal points are either close to or within the clusters. Usually, clustering algorithms are used with classification algorithms to enhance anomaly detection accuracy. Because of resource usage, most of the clustering algorithms cannot be directly applied to IoT devices for anomaly detection. Another unsupervised learning technique involves dimensionreduction approaches such as P.C.A. and A.E. to remove noise and redundancy from data

to reduce the dimension of original data [44,45]. P.C.A. has been extensively applied to anomaly detection, but it fails in the dynamic IoT environment. A.E. has produced promising results in IoT anomaly detection in reducing data sizes and in reconstructing errors to identify anomalous points. However, these techniques have been used extensively as a part of feature extraction for classification algorithms. The dimensionality reduction algorithms in unsupervised learning can be adapted to IoT anomaly detection. Semi-supervised algorithms combine discriminative and generative algorithms by providing normal data instances so that deviation from normal behaviour is seen as abnormal behaviour. Hence, anomaly detection in IoT is geared toward unsupervised or semi-supervised algorithms where normal system profiling is utilised as a baseline environment [46].

Table 2 shows the state-of-the-art machine learning algorithms according to three anomaly types.


**Table 2.** Learning Algorithms According to Anomaly Types and Machine Learning Schemes.

#### *4.2. Training Detection Schemes Based on Federated Learning Algorithms*

Federated learning, also known as collaborative learning, allows IoT devices to train machine learning models locally and send the trained models, not the local data, to the server for aggregation [47,48]. This training method is different from the standard machine learning training approaches that require centralising the training data in one place such as a server or data centre.

The federating learning method consists of four main steps. First, the server initialises a global machine learning model for anomaly detection and selects a subset of IoT devices to send the initialised model. Second, each selected IoT device will train the model by using its local data, then send the trained model back to the server. Next, the server will aggregate received models to form the global model. Finally, the server will send the final model to all IoT devices to detect anomalies. Note that the server can repeat the tasks of selecting a sub-set of IoT devices, sending the global model, receiving the trained models, and aggregating the received models multiple times, as some devices may not be available at the time of federated computation or some may have dropped out during each round.

By using federated learning, data in the IoT system is decentralised, and data privacy is protected. The other advantages of federated learning include lower latency, less network load, less power consumption, and can be applied across multiple organisations. However, federated learning also suffers from some drawbacks such as inference attacks [49] and model poisoning [50].

#### *4.3. Detection Mechanisms Based on Data Sources and Dimensions*

Univariate IoT data consists of data representation from a single IoT device over time. In reality, anomaly detection systems utilise data from multiple IoT devices deployed in complex environments. These multivariate multi-sources feed richer contexts by providing noise-tolerant temporal and spatial information than a single source.

#### 4.3.1. Univariate Using Non-Regressive Scheme

In the non-regressive scheme, threshold-based mechanisms can be leveraged by setting low and high thresholds of observations on univariate stationary data to flag anomalies if a data point falls outside the boundary. More advanced mechanisms such as mean and variance thresholds produced over historical data can replace this min–max approach. Another similar approach is using a box plot to split data distribution into a range of small categories where new data points are compared against the boxes. These non-regressive approaches are ideal in saving resources such as processors and memories for IoT devices. However, being distributed techniques over univariate observations, the range-based schemes fail to detect contextual and collective anomalies due to the lack of the ability to capture temporal relationships [3].

N.N.s such as A.E.s, recurrent neural networks (R.N.N.), and long short-term memory (L.S.T.M.) can be used as non-regressive models to solve the problem of anomaly detection in the IoT ecosystem using univariate time series data. A.E. is used to reconstruct data symmetrically from the input to the output layer, and a high reconstruction error probably indicates abnormality [13]. A.E. can also be applied to resource-constrained IoT devices for conserving resources and battery power. On the other hand, R.N.N. provides memory in the network by affecting neurons from previous outputs through feedback loops. This enables the capture of temporal contexts over time. The vanishing gradient problem in R.N.N. makes it unsuitable for large IoT networks. L.S.T.M. can provide semi-supervised learning on normal time series data to identify anomaly sequences from reconstruction to solve this error problem. Hence, it seems that combining A.E. and L.S.T.M. can bring about resource-saving and accuracy requirements of the IoT anomaly detection tasks.

#### 4.3.2. Univariate Using Regressive Scheme

Predictive approaches, known as regressive schemes, enable identifying anomalies by comparing predicted value to actual value in time series data. Parametric models such as autoregressive moving average (A.R.M.A.) are popular techniques despite seasonality or mean shift problems in non-stationary datasets. However, these problems can be solved by using enhanced variants of A.R.M.A. such as autoregressive integrated moving average (A.R.I.M.A.) and seasonal A.R.M.A. As another approach to predictive IoT anomaly detection, NN-based predictive models such as M.L.P., R.N.N., L.S.T.M., and others can be applied to capture the dynamics of a time series on complex univariate data [46]. For instance, R.N.N., L.S.T.M., and G.R.U. models can represent the variability in time series data to predict the expected values for time sequences. Recently, attention-based models have been applied to IoT anomaly detection in complex long sequential data. Similar to the non-regressive scheme, sequential models can boost the accuracy of IoT anomaly detection if dimensional reduction algorithms can be used in feature extraction.

#### 4.3.3. Multivariate Using Regressive Scheme

As the additional variables increase data sizes, dimensionality reduction techniques such as P.C.A., A.E., and others can be employed to decrease overall data size. P.C.A. can capture the interdependence of variables for multivariate sources. It reduces the data size by decomposing multivariate data into a reduced set. The linearity and computational complexity of P.C.A. can limit its usage for IoT anomaly detection. A.E. works like P.C.A. and can discover anomalies in multivariate time series data using reconstruction error, the same way as in univariate cases. The promising aspect of A.E. is its low resource usage and its non-linear feature extraction. Similar to predictive and non-predictive models on univariate data, schemes using L.S.T.M., CNN, DBN, and others can also be applied to identifying anomalies in multi-source IoT systems. Specifically, CNN and L.S.T.M. algorithms can be preceded by A.E. for important feature extraction and resource savings. These deep learning schemes can learn spatio-temporal aspects of multivariate IoT data [12].

Clustering mechanisms are another approach to detect anomalies in multivariate data. In addition, graph networks can be used to learn models about variable or sequence relationships where the weakest weight between graph nodes is considered anomalous.

#### **5. Analysis of Machine Learning for IoT Anomaly Detection**

Anomaly detection systems have proven their capabilities of defending traditional networks by detecting suspicious behaviours. However, the standalone anomaly detection systems in classical systems do not fit the architecture of distributed IoT networks. In such systems, a single node compromise could damage the entire network. By collecting traffic from various spots, a collaborative anomaly detection framework plays a paramount role in thwarting cyber threats. However, the trust relationship and data sharing form two major challenges [42,51]. In this massive network, insider attacks can be a serious issue.

Furthermore, as most anomaly detection systems apply machine learning, nodes may not be willing to share normal profiles for training or performance optimisation due to privacy issues. The trust problem can be solved by implementing a central server that handles trust computation and data sharing. However, this approach could lead to a single point of failure and security, specifically for the large-scale deployment of IoT devices. Recently, blockchain has attracted much interest in financial sectors for its capability of forming trust among mistrusting entities using contracts and consensus. Blockchain could provide an opportunity to solve the problem of collaborative anomaly detection by providing trust managemen<sup>t</sup> and a data-sharing platform. In the remainder of this section, we will focus on analysing (1) the collaborative architecture for IoT anomaly detection using blockchain; (2) datasets and algorithms for IoT anomaly detection; and (3) resource requirements of IoT anomaly detection.

#### *5.1. Collaborative Architecture for IoT Anomaly Detection*

Blockchain is a decentralised ledger that provides immutability, trustworthiness, authenticity, and accountability mechanisms for the maintained records based on majority consensus. Though it was originally applied to digital currency systems, blockchain can be applied in various fields. With the power of public-key cryptography, strong hash functions, and consensus algorithms, participating nodes in a blockchain can verify the formation of new blocks. A block typically consists of a group of records, timestamp, previous block hash, nonce, and a block's hash. Thus, the change in a record or group of records will be reflected in the next block's previous hash field, which makes it immune to adversarial change [42].

The powerful attributes of blockchain could provide a solid foundation for anomaly detection in distributed networks such as the IoT. IoT devices can collaboratively develop a global anomaly detection model from local models without adversarial attacks using blockchain architecture. As IoT needs mutual trust to share local models in a secure and tamper-proof way, consensus algorithms and decentralised blockchain storage make it challenging for malicious actors to manipulate the network. However, the successful Bitcoin consensus algorithms in financial areas such as proof-of-work require extensive storage and processing capabilities. Etherium has applied proof-of-stake where the participants' stakes determine consensus. It uses smart contracts, and is less computationally intensive. Hyperledger Fabric is another customisable blockchain platform that applies smart contracts in distributed systems rather than cryptocurrencies. As it relies on central service to enable participants to endorse transactions, endorsing participants must agree on the value of a transaction to reflect changes in the local participant ledger. These three popular blockchain systems do not seem to solve resource-constrained IoT devices [51].

Blockchain-based security solutions have been discussed in a mix of traditional and IoT systems [52,53]. In these studies, a resource-rich device was connected to IoT devices, where the device acts as a proxy to connect IoT devices to the blockchain. A similar study was conducted in [54]. The main advantages of these approaches lie in resource savings, but they may also create a central point of failure. In [55], the author's utilised smart contracts to

integrate IoT devices into blockchain for communication integrity and authenticity through the resource requirement issues that may not make it practical. The most promising result has been achieved on distributed and collaborative IoT anomaly detection [51]. The study uses a self-attestation mechanism to establish a dynamic trusted model against which nodes compare to detect anomalous behaviour. The model is cooperatively updated by majority consensus before being distributed to peers.

#### *5.2. Datasets and Algorithms for IoT Anomaly Detection*

The lack of labelled realistic datasets has hampered anomaly detection research in the IoT. The existing data suffer from lacking realistic representation for IoT traffic patterns and lack capture of the full range of anomalies that may occur in the IoT. Class imbalance between normal traffic and anomalous patterns also manifests, which makes classification systems inefficient. Most IoT traffic can be represented as normal behaviour while it dynamically changes over time. As contextual information such as time, environment, and neighbour nodes profile rich information to improve anomaly detection in the IoT, it seems that multivariate data plays a significant role. The challenges associated with the absence of truly representative, realistic, and balanced datasets favour an anomaly detection scheme that profiles normal behaviours to detect anomalous points that deviate from the normal data [56]. Table 3 shows the common datasets that have been commonly used in some recent studies in this research area. As can be seen, most datasets are not specific to the IoT system; however, they are still suitable for training and evaluating anomaly-based I.D.S.s because they contain both normal and abnormal data.


**Table 3.** Common Datasets for Anomaly Detection in the IoT System (Adapted from [1]).

The initial deployment of the IoT anomaly detection system lacks historical data that specify normal and anomalous points. This absence and the rare nature of anomalies challenge the usage of traditional machine learning schemes. Though several techniques of solving imbalanced data have been proposed, such methods cannot maintain the temporal context of anomalies. In addition, supervised algorithms capture only known anomalies while failing to detect novel attacks. Thus, unsupervised or semi-supervised approaches can be used to solve the limitations of supervised algorithms [54].

While several techniques have been used in IoT anomaly detection, most of the approaches have failed to satisfy the resource and power requirements of IoT devices [54]. Though there is no single best anomaly detection approach, deep learning techniques, specifically A.E. and CNN, have shown promising results in both delivering better resourcesaving and accuracy, respectively [64]. While algorithms such as CNN and L.S.T.M. can boost detection accuracy, A.E. can be used to reduce the dimension of data and extract representative features by eliminating noise. Specifically, L.S.T.M. can be applied to dynamic and complex observations within time-series IoT data over a long sequence. Thus, it suggests that these techniques or combinations could be further explored to detect anomalies in the IoT ecosystem [65].

#### *5.3. Resource Requirements of IoT Anomaly Detection*

The resource-constrained nature of IoT devices prohibits the deployment of traditional host-based intrusion detection such as anti-malware and anti-virus. As traffic analysis consumes huge computational resources during anomaly detection, incremental approaches such as sliding windows can reduce the processing and storage requirements for IoT devices. It is also critical that the anomaly detection engine of the IoT system should operate in near real-time for reliable detection. This indicates that adaptive techniques help to improve the detection model over time without major retraining. However, offline training may be applied for initial deployment.
