In this section, the related work is categorized into ML-based methods, DL-based models, combined ML and DL approaches, Transformer-based approaches, and others, as discussed in the following sections.
3.1. Machine Learning Methods
In [
11], the authors produced a novel healthcare IoT dataset (WUSTL-EHMS) [
12]) for an enhanced healthcare monitoring system. The database contains 28 features related to network traffic data and eight biometric characteristic features. For intrusion detection, four machine learning algorithms were evaluated: random forest (RF), KNN, SVM, and ANN. The best results (lowest prediction time and highest area under the curve) were achieved using the ANN model. The experiments confirmed the positive impact of combining network features with features gathered from patients’ biometrics. The authors planned to enhance their work by tuning the model’s hyperparameters and improving the data quality using feature engineering. One possible limitation of the work in [
11] is the assumption that the data collected using medical sensors are transmitted in plain text to avoid high processing power.
The work in [
13] addressed three main challenges: designing a distributed security framework, ensuring security while dealing with big data, and designing a robust anomaly-based IDS. The proposed IDS [
13] uses a fog cloud architecture and ensemble learning. The first-level learners in the model are decision trees (DT), Naive Bayes (NB), and RF. Using stacking, the output of the ML classifiers is input to XGBoost to determine the final classification of the system. The model was investigated using the ToN-IoT database. The reported evaluation metrics were the accuracy, precision, detection rate, false alarm rate, and F1 score.
The model has the following strengths: it is simple, it relies on few parameters, and it can be updated in real time. A possible limitation of this approach is that it tackles only binary classification. The authors planned to extend their work so that it could detect different network attacks. Another future direction that they mentioned was to adopt various techniques in the feature selection stage.
In the framework of the IoMT, the paper [
14] tackles the crucial cybersecurity problem. The authors aimed to create efficient techniques for the identification and cessation of cyberattacks in IoMT environments. The study used the WUSTL-EHMS 2020 dataset, which includes biometric and network information. Data cleaning and feature selection were the two preparatory procedures the authors used. Several performance metrics, including the accuracy, precision, recall, F1 score, and mean-squared error loss, were employed to evaluate the ML models. The RF method produced accuracy of 96.9% and achieved values above 96% in terms of precision, recall, and F1 score. Gradient boosting (GB) produced comparable outcomes in terms of precision, recall, and F1 score, with accuracy of 96.5%. The SVM’s accuracy was 95.85% compared to the other two models, but its precision, recall, and F1 score were marginally lower. The recommended machine learning models were evaluated against the current methods in this research. It was seen that the suggested random forest and GB models performed better than the conventional techniques. In contrast, the SVM model produced results that were competitive with those of earlier research.
The research in [
15] presented an investigation into creating an ML intrusion detection system (IDS) for the IoMT. The authors examined several ensemble learning strategies, such as GB, extreme GB, bagging, RF, and ensemble voting classifiers. These ensemble approaches enhance intrusion detection by combining the predictions of several models. For every classification model, the hyperparameters were optimized to guarantee optimal performance. By selecting critical features for intrusion detection, the authors carried out feature selection. Considerations included the source and destination IP bytes, protocol, connection state, etc. The AdaBoost (ADB) classifier outperformed the other examined classification models regarding all performance measures. It recorded the lowest false discovery rate (FDR), F1 score, accuracy, and precision. Two current models (Model 1 and Model 2) were compared to the proposed ADB-based IDS for the IoMT. The proposed model performed better than the other compared models in terms of the false positive rate (FPR), FDR, accuracy, and precision.
To provide an efficient and effective security solution while considering the limitations of Internet of Medical Things (IoMT) networks, one work [
16] offered an anomaly intrusion detection system (AIDS) explicitly designed for IoMT networks. The authors provided an all-inclusive AIDS framework for IoMT networks that included modules for dataset generation, feature normalization, data collection, and central detection, using ML methods to detect intrusions. The network’s core gateway was the target of data gathering, which included disc and Wi-Fi bandwidth utilization, CPU and memory utilization, and energy consumption data. It also contained network data, such as the protocol details, packet sizes, and source and destination IP addresses. CSV datasets containing gateway-, network-, and device-specific sets were created from the processed data. By balancing the feature values, feature normalization prevented dominance problems. The central detection module used machine learning methods to find irregularities and intrusions in the IoMT network. Regarding binary classification tasks, DT, RF, and KNN performed better than other popular machine learning methods. They demonstrated great recall, accuracy, precision, and F1 score, which qualified them for IoMT intrusion detection. Future work, including the installation of hardware prototypes, multi-class support, hyperparameter optimization, and the investigation of deep learning for cloud-based solutions, was also outlined in the study.
The authors in [
17] addressed the challenge of the lack of openly accessible datasets on the Internet of Healthcare Things (IoHT). They contributed by designing the ECUIoHT database, thus encouraging more researchers to develop robust models for IoHT security. The authors launched different types of network attacks, such as network mapper (Nmap), address resolution protocol (ARP) spoofing, DoS, and smurf attacks. Numerous anomaly detection methods were investigated, with six variations of nearest neighbor algorithms, three clustering algorithms, two statistical-based methods, and one kernel-based algorithm. Specifically, the list of tested techniques included KNN, the approximate local correlation integral (aLOCI), the local outlier probability (LoOP), influenced outlierness (INFLO), the cluster-based local outlier factor (CBLOF), and the clustering-based multi-variate Gaussian outlier score (CMGOS). Additionally, the list included the local density cluster-based outlier factor (LDCOF), robust principal component analysis (RPCA), and the histogram-based outlier score (HBOS). Lastly, the one-class support vector machine (LIBSVM) method was also evaluated. Possible limitations for the work in [
17] included the following. The class imbalance problem was not addressed. At the same time, it was clear from the distribution of benign versus attack instances that some attacks represented minority classes compared to normal network traffic. In addition, the DoS attacks appeared to be artificial and could not be detected with good performance (as was noticed from the F1 score bar graph in [
17]). Moreover, each attack type was tested independently (i.e., the system used binary, anomaly-based detection). In other words, the authors used four different subsets of the data and tackled a single type of attack in each test.
In [
18], the authors’ approach showed effectiveness in reducing cyberattacks compared to other machine learning approaches. A framework using algorithms for task scheduling using blockchain and a combination of deep reinforcement learning was developed to enhance the performance of healthcare applications in a distributed IoMT environment. The proposed framework utilized a temporal LSTM deep neural network for disease detection and anticipation. The Bayesian optimization algorithm was used to optimize the parameters of an EML-based model for IoT security attack detection. The proposed approach showed superior precision, recall, F1 score, and receiver operating characteristic–area under the curve (ROC-AUC) performance compared to other approaches. The suggested approach was found to achieve better performance overall, with a 32 percent time complexity reduction and a 15 percent increase in current accuracy values when evaluated against several ML methods, such as LR, RF, Naive Bayes, decision trees, extreme machine learning, and ensemble learning approaches, such as extreme machine learning with genetic algorithms and EML with RS. These resulted in the successful classification and prediction of attacks in the Internet of Medical Things environment. EML works for some patterns but may not be suitable for large, nonlinear datasets. Deep learning neural networks are recommended for untrained features.
Wazid et al. [
19] reported the effectiveness of a new model, EID-HS—Envisioned Intrusion Detection in Industry 5.0-driven Healthcare Applications—which ensembles SVM, DT, and KNN with custom weights to detect new malware using traffic analysis on a large-scale network. Industry 5.0 healthcare systems focus on delivering personalized products for patients given their unique needs. This study demonstrated that ensemble deep learning yields promising results in such systems. The EIDHS system was robust against cyberattacks and was shown to outperform existing approaches. The experimental results on the NSL-KDD dataset, with 81,161 intrusion instances, indicated accuracy of 95.12%.
Recent studies exploring machine learning methods for cybersecurity within IoT and IoMT environments have shown notable advancements but have also revealed several key areas in need of enhancement. In the study conducted by Hady et al. [
11], a novel dataset integrating both biometric and network features was introduced. However, the research assumed that data collected from medical sensors are transmitted in plain text, potentially compromising the data security. This assumption reveals a critical need for more robust data handling protocols that enhance the security without overwhelming the processing capabilities. Moreover, Kumar et al. [
13] utilized a fog cloud architecture and ensemble learning, primarily handling binary classification. This approach may be inadequate against more complex, multi-class cyber threats, which are becoming increasingly common in modern networks. This limitation suggests a need for models capable of efficiently differentiating among a broader array of attack vectors. Additionally, the study by Tauqeer et al. [
14] aimed to address sophisticated cyberattacks in IoMT settings. However, the effectiveness of these machine learning models could be limited by the current approaches to data preprocessing and feature selection. This indicates the potential for improvements in feature engineering techniques to better capture and utilize the nuances of cybersecurity data.
To address these identified challenges, the following enhancements could be beneficial. First, for research like that of Hady et al. [
11], integrating advanced encryption methods during data transmission could significantly mitigate the risk of security breaches. Employing lightweight cryptographic algorithms might provide an optimal balance between security and operational efficiency. Second, in response to the gaps identified in Kumar et al.’s work [
13], it is crucial to develop machine learning models capable of effective multi-class classification. Incorporating sophisticated algorithms could improve the system’s ability to manage diverse cyber threats. Third, to augment the performance of the models discussed in Tauqeer et al.’s study [
14], implementing more advanced feature engineering methods, including automated feature learning through deep learning techniques, could enhance both the accuracy and detection capabilities. Finally, integrating federated learning could enhance the scalability and robustness of intrusion detection systems by allowing multiple decentralized devices to train models collaboratively, without compromising data privacy.
Table 2 compares cutting-edge IDS techniques, including the methodologies, classification types, datasets, evaluation measures, and constraints.
Table 2 lists the ML-based approaches. The table provides useful information and aids in identifying research gaps, which will drive future research into network intrusion detection systems.
3.2. Deep Learning Methods
Marwa et al. [
20] presented a recent anomaly detection technique built on deep learning and deep clustering. The paper addressed two main challenges, namely feature randomness and feature drift. The first challenge, feature randomness, occurs when training a neural network encoder without a decoder, based on hypothetical similarities. This method may result in the encoder producing features that do not accurately represent the distinguishing characteristics of the data. On the other hand, feature drift arises when combining clustering and reconstruction objectives in the use of autoencoders. Clustering aims to simplify data by removing unimportant details, whereas reconstruction strives to preserve all information. Consequently, feature drift occurs when there is a failure to balance these conflicting objectives. The proposed algorithm utilized the concept of deep subclass dispersion within a one-class support vector machine (deep SDOSVM). The main steps of the algorithm were feature mapping, feature selection, clustering using a dynamic autoencoder (DynAE), feature normalization, subclass matrix calculation, model training, model testing, and a performance assessment. The authors used performance metrics such as the false positive rate, true positive rate, number of support vectors (SVs), ROC curve, AUC, probability values (
p-values), and training time. For future work, the authors planned to implement an incremental IDS to overcome the batch learning model limitation. The upgraded version should effectively manage sequential and large-scale datasets, while learning using the limited available data samples.
The study in [
21] introduced an ensemble deep learning technique for the classification of network attacks. The authors built a robust generative adversarial network based on ensemble convolutional neural networks (GANsECNN). The model was used to generate synthetic data for each type of network attack. Experiments were performed using two publicly available datasets, the NSL-KDD and the UNSWNB15 datasets. The performance was measured using the following metrics: accuracy, precision, recall, and generator and discriminator loss. The experiments indicated that the suggested method could enhance multi-class classification by around 10% using the generated samples. The proposed method presented a stable architecture, and the model converged rapidly. However, the reported results were not superior to related results from experiments on the same datasets.
A new deep learning method was adopted in [
22] to apply to medical cyber-physical systems. The authors proposed a federated learning (FL) architecture that utilized generative adversarial networks (GANs). The GAN models were trained on two categories of data: medical and network traffic data. The CHARIS [
23,
24] clinical dataset and the UNSW-NB15 dataset were used for medical anomaly detection and network traffic data detection, respectively. Data modification and scrambling attacks were launched on the medical data. At the same time, the algorithm detected several types of attack on the network traffic, including backdoors, Denial of Service attacks, shellcodes, and worms. Five performance metrics were reported: accuracy, recall, precision, F1 score, and AUC. The authors concluded that the federated models achieved better results than non-federated models. However, the results were not very high for network flow anomaly detection. For example, the F1 scores were 0.77 and 0.78 for non-federated versus federated network flow models. As part of their future work to enhance their model, the authors planned to augment a range of deep learning methods with privacy prevention policies.
Intrusion detection using a cloud-based model was introduced in [
25]. The authors implemented a hierarchical federated learning (HFL) algorithm. In contrast, the proposed hierarchical long short-term memory (HLSTM) model was used to distinguish between health records and detect intrusions in the incoming network traffic. The proposed model was reported to require minimal training while safeguarding IoMT networks against various network attacks. The IDS was tested using the TON-IoT and NSL-KDD datasets. Various metrics, including the accuracy, precision, recall, and F1 score, were utilized to assess the model’s performance. The experimental outcomes demonstrated the effectiveness of the proposed model. The authors suggested including the Gurobi optimization solver for future extensions to optimize the performance. In addition, they planned to explore how the model performed concerning heterogeneity, interoperability, and scalability [
25].
Two multi-class classification models, namely DenseNet and Inception Time, were proposed in [
26]. The models were used to identify cyberattacks on an IoT network. The proposed models were trained on three publicly available benchmark datasets: ToN-IoT, Edge-IIoT, and UNSW2015. The evaluation measures used were the accuracy, recall, precision, and F1 score. The DenseNet model achieved a remarkable 99.9% accuracy for the ToN-IoT dataset. However, when using the Inception Time architecture, the results reached a perfect 100% accuracy. In the case of the Edge-IIoT dataset with the Inception Time architecture deployed, the accuracy reached 94.94%.
The necessity of an IDS in the IoMT is emphasized in the paper [
27], seeking to identify and alert administrators to potentially dangerous activity. Federated learning (FL), a method of fitting machine learning models across distributed platforms without the need for data exchange, was introduced in this work. The authors described how their suggested model, which used deep neural network (DNN) methods, operated. The model was composed of a global model disseminated to local edge devices after being trained on a source domain. Using local datasets for training, the local models’ expertise was fed back into the global model without jeopardizing the integrity of the local datasets. This procedure enabled rapid, safe, customized intrusion detection on edge devices. The learning method, including layer freezing and CORAL loss minimization, was described in the study. This procedure aimed to enhance the model performance with each local dataset and fine tune the global model. The model demonstrated greater accuracy compared to classic machine learning and deep learning techniques, underscoring its efficacy in intrusion detection. It maintained realistic prediction times for real-time detection on edge devices without regular connectivity to cloud servers.
In recent years, the use of DNNs for the detection of cyberattacks in IoT networks has been on the rise. However, this technique for cyberattack detection brings challenges, as it is computationally complex to apply and vulnerable to adversarial samples. The study in [
28] aimed to address these challenges by enhancing the accuracy of DNN models while reducing the computational complexity, especially in resource-constrained environments. The fully connected neural network (FCNN) model was presented in the paper as the baseline model for cyberattack detection. The authors then proposed a performance enhancement technique that integrated pruning, simulated micro-batching, and parameter optimization to handle the computational complexity problems of the DNN models.
By integrating the proposed optimization method into the baseline model, the authors proposed a refined model, namely Robust Effective and Resource-Efficient DNN (REDNN). Three publicly available benchmark datasets—N-BaIoT, Kitsune, and WUSTL—were used to test the performance of the newly suggested model. The robustness of the proposed model was also presented. The robustness was tested against various factors, including the number of epochs, clipped perturbation samples, and model variations. The efficiency of the REDNN model was then compared against that of the baseline model and state-of-the-art techniques. The REDNN model, as proposed, exhibited robustness against adversarial attacks and achieved an unconventionally high level of accuracy in detecting cyberattacks within IoT networks. It also demonstrated the significant conservation of resources. Notably, the suggested model demonstrated considerable decreases in memory and time utilization compared to the benchmark in simulated virtual worker environments. Additionally, its effectiveness was demonstrated in a federated learning (FL) setting, highlighting its robustness and efficiency in real-world scenarios.
The primary focus of the work in [
29] was to reduce the complexity and classification time, which improved the accuracy when using DL-based techniques in a cybersecurity IDS for IoT networks. The authors proposed three different models using various deep learning techniques. The deep learning techniques used for the three models were a feed-forward neural network (FFNN), LSTM, and a random neural network (RandNN). Each model was trained on the CIC IoT 2022 dataset. The proposed framework consisted of five stages. First, features were extracted using CICFlowMeter 4.0. Second, the data preprocessing stage took place, including data cleaning, encoding, and scaling. Then, data balancing, feature selection using principal component analysis (PCA), and data splitting were performed. The proposed models aimed to classify instances into “Normal” or “Attack” for binary classification problems. However, for multi-class problems, the proposed models classified instances as “Normal” or identified their specific attack types. These newly suggested models were then compared with one another and with the traditional ML IDS models and state-of-the-art IDS models. The RandNN model showed promising performance as it could capture complex dependencies. The LSTM model also showed promising performance because it captured the time-based dynamics within IoT data. The newly proposed FFNN model demonstrated enhanced performance compared to the proposed LSTM, the RanDNN, and other ML and DL-based models.
Recent advancements in deep learning methods have significantly improved anomaly detection and cyberattack classification within IoT and IoMT environments. However, several studies highlight persistent research gaps that could hinder their broader application and effectiveness. Notably, Marwa et al. [
20] and other subsequent studies reveal issues such as feature randomness and feature drift, where deep learning models struggle to balance data simplification with accurate information preservation. This challenge points to an inherent limitation in the current autoencoder architectures used for cybersecurity purposes. Furthermore, while methods like those proposed by Raha et al. [
21] for the generation of synthetic data to train robust models show promise, they often do not outperform existing benchmarks, indicating a gap in the efficacy of such generative approaches. Similarly, the use of federated learning in medical cyber-physical systems, as discussed in Ilias et al. [
22], although improving the performance over non-federated models, still shows sub-optimal results in certain key areas like network flow anomaly detection.
To address these gaps, this paper proposes several solutions aimed at enhancing the current state of deep learning techniques in cybersecurity. First, to tackle feature randomness and drift, an advanced deep learning architecture that integrates enhanced regularization techniques could be developed. This would involve sophisticated training regimes that more effectively capture the nuances of cybersecurity data, thereby producing features that better represent the underlying patterns without oversimplification. Second, to improve the performance of generative adversarial networks in cybersecurity, it is essential to integrate novel adversarial training frameworks that can generate more diverse and challenging synthetic datasets. This approach will ensure that the models are not only robust against known types of attacks but are also prepared for zero-day exploits. Lastly, the application of federated learning models in cybersecurity could be enhanced by incorporating multi-modal learning strategies that leverage both structured and unstructured data from various IoT devices. This method would help in better understanding the context of the data, thus improving the detection accuracy of network anomalies. Additionally, exploring advanced optimization algorithms could significantly reduce the computational overhead, making these models feasible for deployment in resource-constrained environments.
Table 3 summarizes the DL-based methods mentioned in this section.
3.3. Combined Machine Learning and Deep Learning Methods
The primary goal of the study in [
30] was to develop an effective IDS to safeguard IoMT networks against cyberattacks. The author highlighted the growing significance of the IoMT in healthcare, but he also drew attention to the vulnerabilities brought about by medical devices’ interconnectedness. The paper recommended using a fog cloud architecture to solve the security concerns related to the IoMT. The ensemble learning technique used by the suggested IDS system integrates several long short-term memory (LSTM) networks. The author presented a deployment methodology that offers Infrastructure as a Service (IaaS) in the cloud and Software as a Service (SaaS) in the fog. The paper also covered several data preprocessing methods, such as feature mapping, data imputation, and feature selection, to prepare the dataset for intrusion detection. Learning curves and misclassification errors were used to assess the performance of the proposed technique, which demonstrated that the ensemble approach performed much better than a decision tree. The ROC curve was used to evaluate the classifier’s performance; the AUC showed that the proposed method performed better than the baseline. The work in [
31] aimed to provide a reliable approach for the detection of anomalies and attacks in IoMT devices used in healthcare. The study used four real IoMT datasets gathered from actual healthcare devices: WUSTL-EHMS, TON-IoT, ICU, and ECU-IoHT. The suggested approach used machine learning techniques to perform multiple phases of dataset cleaning, feature selection, feature extraction, and classification. The study used the recursive feature elimination (RFE) technique to choose the most crucial characteristics. The first method used was the KNN classifier. Then, a multi-layer perceptron (MLP) classifier with hyperparameters adjusted was employed to improve the classification. The efficacy of the suggested approach in identifying abnormalities and cyberattacks was demonstrated by its excellent accuracy rates across all IoMT datasets. Better performance was obtained by combining the hyperparameter-tuned MLP with XGBRegressor-based feature selection. The authors intended to set up an IoMT laboratory to improve the attack detection accuracy and investigate potential new research avenues.
The work in [
32] focused on intrusion detection in the context of the IoMT, recognizing two forms of attacks: data spoofing and data alteration. The primary goal was to develop an effective intrusion detection model to secure healthcare systems that handle sensitive information. The study used a range of assessment metrics to assess the intrusion detection model’s performance. These measures included the ROC-AUC and the following: prediction time, F1 score, accuracy, precision, false acceptance rate (FAR), and recall (detection rate). To improve the precision and effectiveness of intrusion detection, the suggested model combined several machine learning methods, feature scaling strategies, data augmentation, and class weight ratios. Noteworthy average testing accuracy of 94.23%, indicating the model’s superiority over an existing method, was one of the most important findings. The model’s efficiency was mainly ascribed to the shorter prediction times attained in feature, algorithm, and data preparation selection. However, there were also acknowledged drawbacks, such as the model’s narrow focus on only two types of attacks and its assessment in a simulated network context. In further research, the author suggested including more attack types to create a more comprehensive intrusion detection model, looking at deep learning models with minimum complexity to increase the AUC and detection rates.
In [
33], network traffic and patient biometric data were combined into a dataset using an ARGUS tool to improve intrusion detection in the IoMT. There were 44 features in the dataset, comprising 9 biometric features and 35 network traffic features. The dataset was initially composed of 2046 attack samples and 14,272 normal samples. When attacks were infrequent, 1400 attack samples were chosen at random to replicate real-time networks. A CNN, DNN, and LSTM were utilized for intrusion detection. AdaBoost performed the best among all machine learning models, with accuracy of 91.6%. All ML models displayed high recall, F1 score, and precision. The DNN achieved maximum accuracy of 96%, surpassing both the CNN and LSTM in performance. The precision, recall, and F1 score of the DL models were consistently good. However, the DNN obtained the best overall accuracy. The suggested particle swarm optimization deep neural network (PSO-DNN) algorithm achieved accuracy of 96%, which was 3.2% better than that in previous studies, outperforming cutting-edge techniques. To improve IoMT attack categorization in the future, the author proposed integrating particle swarm optimization (PSO) for feature selection and DNNs for intrusion detection.
Developing a strong IDS to safeguard sensitive medical data against attacks and breaches was the main goal of the paper [
34]. The authors used a Kaggle intrusion detection dataset that is globally benchmarked and covers a variety of attack methods along with typical network behavior. PCA was used as a dimensionality reduction approach to solve the dataset’s high dimensionality problem. By reducing the number of attributes, this strategy increased the efficiency. The Grey Wolf Optimization (GWO) method makes this research distinctive. After PCA, GWO was employed as a second-level optimization method. GWO aids in further dimensionality reduction while maintaining crucial characteristics. The dataset was subjected to various categorization models, such as NB, RF, SVM, KNN, and DNN. These models were designed to be used for intrusion detection. According to the paper’s findings, the suggested classifier model regularly performed better than alternative models with respect to sensitivity, specificity, and accuracy. The classification model became more efficient and required less training time as the dimensionality was reduced.
By highlighting the possible dangers and repercussions of security holes or online attacks in these systems, the authors of the paper [
35] sought to address the security weaknesses in IoMT contexts. Their primary objective was to create a reliable method of identifying malicious activity in IoMT networks. The authors set up 100 IoT nodes on a fictitious square field using an Intel Xeon system with particular hardware configurations. Throughout the studies, several DDoS assaults were simulated. Wireshark was used to collect data packets. A Python script was developed to extract IoT end-level layer characteristics from packet capture (.pcap) files, which were then transformed into CSV files for analysis. Using a label encoder to transform categorical elements into numerical values and managing missing values are two examples of the data preprocessing steps applied. The authors employed numerous optimization approaches, including Spider Monkey Optimization, Salp Swarm Optimization, Whale Optimization, and a hybrid of Lion and Salp Swarm Optimization (LSSOA). These methods were used to improve the detection of malicious traffic. Several metrics, including the accuracy, precision, recall, F1 score, invalid positive rate (IPR), and invalid negative rate (INR), were used in the research to assess the effectiveness of these optimization strategies. The efficacy of the detection techniques was evaluated using these metrics. The suggested Lion and Salp Swarm Optimization Algorithm (LSSOA), according to the authors, had a recall F1 score of 98.0% and could be applied to a variety of tasks, such as resource allocation, network security, hybrid optimization, the management of numerous variables, and scalability.
This section, with studies on combined machine learning and deep learning methods for cybersecurity in the IoMT, reveals significant advancements and also exposes critical gaps that need to be addressed. In the study by Khan [
30], while the fog cloud architecture and the use of LSTM networks indicate a robust approach to safeguard IoMT networks, the research lacks details on the real-world application and scalability of the proposed system. Similarly, Kilincer et al. [
31] demonstrate improved classification through machine learning techniques and feature selection, but the reliance on conventional methods like KNN and MLP might not be sufficient against more sophisticated or evolving cyber threats. Furthermore, Gupta’s work [
32] points out the limitations of focusing only on two specific types of cyberattacks and conducting assessments in a simulated environment, which might not translate effectively into real-world settings. This highlights a broader issue in the current research: a narrow focus on limited attack types and a lack of comprehensive testing across diverse operational conditions. In addition, the study by Chaganti [
33] shows promising results using a CNN, DNN, and LSTM for intrusion detection, with high accuracy rates. However, the best performance is limited to a DNN model, which suggests the need to explore more integrated or hybrid approaches that can leverage the strengths of various machine learning and deep learning models more effectively.
For systems like those proposed by Khan [
30], it is critical to extend the testing beyond controlled or simulated environments to real-world applications. This would involve deploying the proposed IDS in actual IoMT settings to observe its performance under real operational pressures and attack scenarios. Considering the limitations in the studies by Kilincer et al. [
31] and Chaganti [
33], there is a clear need to develop hybrid models that integrate multiple machine learning and deep learning techniques. Such models could utilize the strengths of different algorithms to enhance their detection capabilities, especially for complex and evolving attack vectors. In response to the narrow focus observed in Gupta’s study [
32], future research should incorporate a wider array of attack types and test the IDS models against a broader spectrum of cyber threats. This would ensure that the IDS is robust and versatile enough to handle various types of cyberattacks. To improve the efficacy of the IDS, as observed across these studies, the implementation of more advanced feature selection techniques such as deep learning-based feature extraction could be explored. Techniques like autoencoders or deep belief networks might offer better feature representation and thus enhance the model’s predictive accuracy.
Table 4 summarizes the methods based on both ML and DL discussed in this section.
3.4. Transformer-Based Methods
The author in [
3] introduced a framework to enhance the security of medical systems. The author built a hybrid security system consisting of two components. The first component of the IoMT system was an intrusion detection system, which aimed to monitor the system for any unauthorized access or breaches. Another element was a malware detection system designed to protect the computers used by medical professionals. The approach used a BERT-based Transformer and a light gradient boosting machine (LightGBM). The recommended method consisted of three main phases, as outlined in [
3]. First, the network flow was derived from the recorded activities. Subsequently, the collected data underwent preprocessing. Subsequently, the classification of each network’s activity as either benign or an assault was performed using the two machine learning algorithms. The model was assessed using four datasets: ECU-IoHT, ToN-IoT, Edge-IIoTset, and EMBER [
36]. The proposed model was found to be capable of detecting many types of assaults, regardless of the specific equipment being targeted. An identified constraint of the model was its intricate deployment process. In addition, a correlation calculation was required to combine the obtained data. In further research, the author intended to include other sophisticated malware families and investigate the use of an analytical approach to merge the system’s outcomes and expedite decision-making in the IoMT setting.
A research study [
37] was published on a modified Transformer neural network (MTNN) designed for the detection of intrusions in IoT systems. The authors proposed a unique approach to identifying cybersecurity vulnerabilities in IoT devices using the MTNN model. The MTNN model, with its smaller parameter count, utilizes the information gain for feature selection and achieves acceptable accuracy. This makes it suitable for implementation in distributed IoT systems, distinguishing it from RNN and LSTM models. The experimental findings on the Ton-IoT dataset [
38] showed large improvements in accuracy, precision, recall, and F1 score. The article examined the use of Transformers in detecting cyberattacks and intrusions in IoT systems and the possibility of using GANs to generate false data injection. The authors further emphasized the need for hyperparameter optimization to enhance the efficacy of Transformer-based models. They suggested using a grid search or Bayesian optimization (BO) as potential approaches. The authors discussed the possibility of using generative adversarial networks and federated learning to improve distributed learning in IoT systems in the future.
The authors in [
39] presented a novel method for intrusion detection systems called the robust Transformer-based intrusion detection system (RTIDS). This innovative method significantly improved the classification accuracy compared to many present detection techniques. RTIDS demonstrated superior performance compared to SVMIDS, with an improvement of 4.56%; RNN-IDS, with an improvement of 1.67%; LSTM-IDS, with an improvement of 0.81%; and FNN-IDS, with an improvement of 3.03%. The system’s performance was further confirmed by comprehensive assessments conducted on two datasets, namely CICIDS2017 and CIC-DDoS2019. Regarding CICIDS2017, RTIDS exhibited remarkable accuracy of 98.45%, precision of 98.32%, recall of 98.73%, and an F1 score of 98.02%. In the case of the CIC-DDoS2019 dataset, the system achieved accuracy of 98.58%, precision of 98.82%, recall of 98.66%, and an F1 score of 98.45%. The underlying robustness of RTIDS resided in its ability to accurately identify network abnormalities and traffic violations, exceeding the capabilities of traditional and DL-based IDS. Further exploring the architecture of the suggested system, the authors emphasized the importance of self-attention mechanisms and strategic data preparation strategies. The comprehensive examination of a dataset including more than 30 million records highlighted the potential of RTIDS in practical applications. A potential area for future research would be optimizing the effectiveness of the Transformer algorithm used in the intrusion detection system to enhance its speed and better address the consequences of abnormal occurrences. In addition, the authors anticipated that incorporating meta-learning would be a viable approach to address the difficulties presented by few-shot categorization situations.
A recent study [
40] developed a new intrusion detection model that combined multi-head attention with bidirectional long short-term memory (BiLSTM). This proposed model employed an embedding layer to transform the intrusion data into a vector format, improving the data representation. The process of embedding converted the initial vectors into two-dimensional vectors. Using the multi-head attention mechanism enabled the model to focus specifically on crucial characteristics within the vector, improving its interpretability. This method was seamlessly integrated with BiLSTM, which, despite not being intended for time series data, can discern connections between distant characteristics, hence linking various features together for predictive purposes. The research used datasets like KDDCUP99, NSLKDD, and CICIDS2017 for training and testing purposes. Optimal model performance was ensured by using data processing methods such as normalization and one-hot encoding. The SMOTE algorithm, also known as the Synthetic Minority Oversampling Technique, was used to tackle the issue of imbalanced class distributions. When compared to other models, the recommended model demonstrated greater performance in terms of accuracy and F1 score. The researchers highlighted specific constraints, such as the model’s inability to precisely detect or report new forms of infiltration. Nevertheless, these invasions might still be categorized for further examination.
The study conducted in [
41] showed that an IDS using a hierarchical attention model obtained detection accuracy of 98.76% and a false alarm rate of 1.49% when the timestep was set to 10. Prior research has suggested the use of ML methods for IDS, which include approaches such as feature selection and the utilization of DNNs. The use of the attention mechanism in the model facilitates the capturing of relevant characteristics and offers the potential for advancements in feature selection and parallel computing. The proposed model exhibited commendable performance on the UNSW-NB15 dataset, with accuracy above 98.76% and a false alarm rate below 1.2%. The suggested model demonstrated a 3.05% enhancement in comparison to the BiLSTM model. A total of 82,332 records were used in the investigation. The experts suggested that future research should prioritize categorizing various forms of attacks using the presented approach. The proposed system does not yet have a second-phase detection capability. In [
42], the authors proposed an intrusion detection method for IoT networks that utilizes an attention mechanism and a bidirectional gated recurrent unit (BiGRU). The authors tackled the issues of unbalanced datasets and insufficient feature information learning in state-of-the-art DL models. The paper introduced SEW-MBiGD, a hybrid intrusion detection model that integrates the SEW model with a BiGRU fusion neural network and an attention mechanism. The SEW model can detect the characteristics of minority groups within a dataset. Furthermore, the data quality was enhanced using model balancing techniques. The studies conducted on NSL-KDD [
43] showed that the SEW model effectively addressed the problem of dataset imbalance. Therefore, the suggested method successfully achieved minority-class learning. In addition, the MBiGD model enhanced its feature acquisition by including multi-head self-attention (MHSA) in the BiGRU. This improvement allowed the model to better evaluate the connections between features and enabled attention to be focused on temporal class information. The SEW-MBiGD model was found to outperform and had superior data comprehension capabilities compared to other models. As an example, it enhanced the precision of the support vector machine (SVM) from 77.7% to 81.2% and resulted in around a 1% improvement for K-nearest neighbors (KNN) and decision trees (DT). The inclusion of the BiGRU model with the MHSA layer resulted in a notable improvement in accuracy, with a 5.3% increase for binary classification and a 4.7% increase for multi-classification. The accuracy was improved by training on a balanced dataset.
In the framework introduced by Abdallah [
3], namely a hybrid system combining BERT-based Transformers and LightGBM for intrusion detection and malware prevention, the complexity of deployment and the need for correlation calculations to integrate data streams highlight significant operational challenges. These issues could impede the method’s scalability and real-world applicability. The research by Ahmed [
37] on a modified Transformer neural network (MTNN) demonstrates improvements in precision and recall using Transformers. However, the reliance on traditional feature selection methods like information gain might limit the model’s ability to process more complex or subtle patterns in IoMT environments. Additionally, the use of generative adversarial networks (GANs) to simulate attack scenarios raises questions about the model’s performance against real-world, novel cyber threats. Wu et al.’s study [
39], which developed a robust Transformer-based intrusion detection system (RTIDS), shows impressive performance metrics. Nevertheless, the system’s focus on known datasets like CICIDS2017 and CIC-DDoS2019 means that it may not fully represent the dynamic nature of cyberattacks in IoMT contexts, suggesting a gap in the adaptability and ongoing learning capabilities of the model. Zhang’s approach [
40] to integrating multi-head attention with BiLSTM to enhance the data interpretability and predictive accuracy is promising but highlights an ongoing challenge in machine learning-based IDS: the detection of new, previously unreported types of cyber threats. This limitation underscores the need for models that can evolve and adapt to new threats dynamically. Liu’s application of a hierarchical attention model [
41] demonstrated high accuracy and low false alarm rates, but the research suggests potential overspecialization to specific dataset characteristics, meaning that the model might not generalize well across different IoMT platforms or attack vectors.
For complex systems like the one proposed by Abdallah [
3], developing streamlined deployment processes and automated data integration tools could reduce the operational complexity and enhance the scalability. This could involve creating modular frameworks that allow for easier customization and integration into existing IoMT infrastructures. To address the limitations noted in Ahmed’s MTNN model [
37], incorporating deep learning approaches such as autoencoders or deep belief networks for feature learning could uncover more nuanced data patterns and improve the model’s efficacy in detecting sophisticated cyber threats. Regarding adaptive and evolving models, for systems like RTIDS [
39], integrating continuous learning mechanisms, such as online learning or reinforcement learning, could allow the system to adapt to new threats dynamically. This would help to maintain high performance even as the attack strategies evolve. Zhang’s use of multi-head attention with BiLSTM [
40] could be enhanced by hybridizing these models with unsupervised learning techniques to detect anomalies that do not fit any known patterns, thereby improving the system’s ability to identify novel threats. Regarding cross-platform validation and testing, ensuring that models like Liu’s hierarchical attention system [
41] are tested across diverse IoMT environments and against a variety of attack simulations could improve their generalizability and robustness.
Table 5 summarizes the Transformer-based methods discussed in this section.
3.5. Other Methods
The paper [
44] proposed an innovative framework, namely the IoT Security Simulator (IoTSecSim). Graphical security modeling (GSM) was utilized to create IoTSecSim. This framework focuses on modeling IoT networks with diverse IoT devices and various network protocols. It could help researchers to simulate different cyberattacks and cybersecurity defenses in IoT networks. Additionally, the effectiveness of these defenses could be assessed using various security metrics embedded in the software. To evaluate the performance of the software, the authors utilized Botnet malware, such as the Mirai virus, seeking to assess the effectiveness of their framework. Three defense methods were tested to demonstrate the framework’s effectiveness: firewall, NIDS, and vulnerability patching. The security generator produced a two-layered hierarchical attack representation model (HARM) to capture malware propagation data. Several security metrics were utilized to assess the computational time for malware infection and its spread across four stages: scanning, accessing, reporting, and installation. Additionally, the authors presented four permutations of attacker behaviors that could impact the spread of malware within a network. The paper provided evidence of the simulator’s correctness through a simulation and sensitivity analysis. The suggested software offered versatile and intricate functionalities for the modeling of existing and upcoming cyberattacks targeting IoT networks. Nevertheless, it exhibited certain constraints. IoTSecSim lacked real-time simulation capabilities for packet flows, and the proposed defense methods could not identify anomalies.
The authors in [
45] proposed a new edge-directed graph multi-head attention network model (EDGMAT) for NIDS. This contemporary framework applied a multi-head attention transformation mechanism to perform weighted aggregation on nodes and edges. The model used traffic directionality and utilized a graph attention network. The evaluation findings demonstrated that the EDGMAT model performed better in multi-class classification, with higher accuracy, recall, and F1 score than state-of-the-art techniques. The model was assessed using four publicly available NIDS. The model was compared to other methods and showed excellent performance. A limitation of the model is that it uses large amounts of GPU memory and requires a long period of time for training to provide comparable levels of accuracy in intrusion detection.
Rayan et al. [
46] proposed a security framework for IoMT devices. The method used machine learning and blockchain technology. The authors used a tri-layered feed-forward neural network (TNN) to classify network traffic into normal traffic or attacks. Anomaly detection was implemented using the TNN, while the blockchain helped to secure the data. The proposed blockchain architecture guaranteed the privacy and integrity of the dataset. For performance evaluation, the ICUDatasetProcessed [
47,
48] dataset was used. It contains 42 features and around 187K records. The presented performance parameters were the confusion matrix, classification accuracy, precision, recall, and F1 score. The method was found to exhibit superior performance. However, when comparing the approach with other cutting-edge techniques, the dataset used appeared to have certain limitations, because the results for most of the baseline methods approached 99%, so the suggested method could only slightly improve the results.
The exploration of diverse methods in cybersecurity for IoT and IoMT devices, as described in recent studies, unveils several research gaps that require attention for enhanced security solutions. The study by Chee et al. [
44] introduces IoTSecSim, a security simulator utilizing graphical security modeling (GSM) to simulate cyberattacks and defenses. Despite its capabilities, the simulator lacks real-time simulation for packet flows, which is critical for dynamic network environments. Additionally, the defense methods proposed do not include anomaly detection, a key component in identifying unforeseen or zero-day attacks. Li’s research [
45] on the edge-directed graph multi-head attention network model (EDGMAT) for network intrusion detection systems (NIDS) demonstrates superior classification performance. However, the model’s strong reliance on GPU memory and extended training times could hinder its practical deployment, especially in resource-constrained environments. Rayan et al. [
46] present a novel framework combining machine learning with blockchain technology to enhance IoMT security. While the blockchain architecture ensures data integrity and privacy, the study reveals a potential overfitting issue or dataset limitations, as indicated by the unusually high performance metrics close to 99 percent for the baseline methods. This raises questions about the robustness and generalizability of the proposed model.
To address the limitations of IoTSecSim noted in Chee et al.’s study [
44], incorporating real-time data processing capabilities could significantly enhance its utility. Integrating more advanced real-time simulation engines and developing capabilities to monitor live network traffic could provide more accurate and timely insights into network security vulnerabilities. For the EDGMAT model presented by Li [
45], optimizing the model to reduce its dependency on extensive GPU resources and training time is crucial. Techniques such as model pruning, quantization, and efficient training algorithms like federated learning could be explored to improve the efficiency without compromising the model’s performance. To strengthen the framework proposed by Rayan et al. [
46], conducting extensive validation against a broader set of attacks and in more diverse network environments would be beneficial. Enhancing the dataset’s diversity and complexity could help to ascertain the true efficacy of the combined blockchain and machine learning approach and identify any overfitting issues. Given the absence of effective anomaly detection in the IoTSecSim framework, incorporating sophisticated anomaly detection algorithms such as unsupervised learning or semi-supervised learning models could fill this gap. These methods could potentially identify novel attack vectors that are not part of the existing threat models used for training.
By addressing these gaps, future research can significantly enhance the effectiveness of cybersecurity measures in IoT and IoMT environments. Improved real-time simulation capabilities, resource-efficient models, robust validation methods, and sophisticated anomaly detection are pivotal in developing resilient security solutions that can adapt to the evolving landscape of cyber threats.
Table 6 summarizes the other surveyed methods discussed in this section.