**1. Introduction**

Cloud computing is one of the latest service innovations in the field of IT. The primary advantage of cloud computing is that it enables access without constraints of location and time. Cloud computing supports mobile and collaborative applications/services, enables the flexibility of controlling storage capacities, and provides lower costs. Moreover, cloud services are multisource, permitting the end-users to use multiple service providers based on their requirements. The use of cloud computing also reduces capital expenditures, power usage, and physical space and maintenance requirements for on-site storage.

As cloud computing services become more and more common, a large number of companies, banks, and governments have adopted this technology. This transition also exposed these systems to many kinds of cyberattacks by hackers and intruders, warranting robust security mechanisms. Many cloud service companies provide several security services as applications. An example is the Amazon Web Services (AWS) store, which provides services with limited validity and dates depending on the period of service license.

With increasing volumes of data, particularly important medical records, the need for continuous backup and updates is evident. Healthcare data is confidential and is an attractive target for hackers to manipulate or use for illegitimate purposes such as financial gain or political motives [1]. Health data and medical records can include specific patient history, information on prescriptive drugs and medical devices, and other confidential patient information. The privacy of this information is of primary importance. Therefore,

**Citation:** Aldallal, A.; Alisa, F. Effective Intrusion Detection System to Secure Data in Cloud Using Machine Learning. *Symmetry* **2021**, *13*, 2306. https://doi.org/10.3390/ sym13122306

Academic Editors: Peng-Yeng Yin, Ray-I Chang, Youcef Gheraibia, Ming-Chin Chuang, Hua-Yi Lin and Jen-Chun Lee

Received: 10 October 2021 Accepted: 16 November 2021 Published: 3 December 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

online attacks including identity theft should be avoided, as they lead to illegitimate access to financial, social, and banking records [1].

When adopting cloud computing technologies, cybersecurity must be a priority aspect for healthcare services because of the confidentiality of patient and operational data. Existing intrusion detection systems (IDS) operate on the mechanism of signature or anomaly detection. When the detection mechanism is not suitable, patient and operational data may be stolen. Cybersecurity strategy helps to detect and protect against malicious actors while helping to improve an organization's defense against cyberattacks.

Existing studies in international hospitals have been limited to user awareness of the importance of cybersecurity such as the use of strong passwords, deletion or filtration of unwanted emails, data encryption, confidential treatment of credentials, careful access of information, and swift reporting of any security breaches [2]. Intrusion detection systems are also in place for health sectors, but their local infrastructure may not be free from vulnerabilities. In contrast, the cloud provides a significantly useful and secure service to manage the operations within hospitals or any other organization.

Sooner or later, all governmen<sup>t</sup> IT operations are bound to be run on the cloud. Currently, the Ministry of Health in Bahrain is switching its systems over to the cloud environment [3]. The cloud provides specific types of security depending on the services provided by the cloud service provider. In certain cases, highly confidential data cannot be hosted on the cloud, while some operational data can be stored online. The data managed on in-house servers must be protected at the same level of security when compared with cloud security. This requires significant work to optimize cloud resource security, which helps to build trust for deploying confidential and sensitive medical data on the cloud.

Several studies have been conducted on hybrid IDS applying different techniques, some of which have proven to have high efficiency while others resulted in only acceptable efficiency. In this research, the implementation of the hybrid IDS was developed by applying both improved genetic algorithm (GA) and support vector machine (SVM) techniques to achieve the best possible results. The proposed system can be used to secure health data in the cloud.

The dataset used in evaluating any IDS has a crucial role in identifying its effectiveness. Therefore, a vast number of research studies have adopted the KDD CUP 99 dataset to evaluate their systems. In this regard, a few examples include [4–7]. Another well-known dataset used in this field is NSL-KDD, which was examined by [8–11]. These datasets are relatively old and have a limited number of features, making them unreliable in simulating current systems and environments.

This work is intended to be applied to cloud data for Bahrain hospitals, but because of the difficulty in accessing their data to develop the intended system, a predefined dataset that mimics the existing cloud data of hospitals, CICIDS2017, was used instead. CICIDS2017 is one of the most recent datasets applied in machine learning applications. The present study is significant because it aimed to:


This system will be effective in cloud computing, as it is expected to provide a high level of information security, which will protect data in the cloud environment against several types of malicious attacks.

The rest of this paper is organized as follows: Section 2 elaborates on the background about IDS and related studies using GA and SVM. The proposed hybrid IDS is presented in Section 3, and the proposed improved GA and SVM are explained thoroughly. Explanation of the experiments and data processing along with the results are delineated in Section 4. Finally, the conclusion and future work are the contents of Section 5.

#### **2. Background and Literature Review**

Intrusion detection systems (IDS) constitute a vital topic studied extensively by researchers. The techniques applied to IDS are basically categorized into anomaly-based IDS, signature-based IDS, and hybrid techniques. When manipulating huge amounts of data related to IDS, the first step is dataset preprocessing. The main stage of preprocessing is feature selection. Once features are selected, machine learning algorithms are applied to classify the normal and abnormal behavior of intruders. In the following text, we highlight the categories of IDS, followed by feature selection techniques, and afterwards, we introduce the hybrid machine learning techniques used in the IDS in the present study.

#### *2.1. Concepts of IDS and Related Work*

An IDS is a system that monitors all outgoing and incoming requests in cloud computing to detect any abnormal or normal activity and is a method of detecting anomalies and misuse intrusion.

Infiltration and intrusions are the main problems in the network and cloud computing, as all services are provided over the internet and are susceptible to cyberattacks. IDS software should be scalable, dynamic, and self-adaptive, with high efficiency to use. The intrusion detection process is divided into three steps [12]:

