EISM-CPS: An Enhanced Intelligent Security Methodology for Cyber-Physical Systems through Hyper-Parameter Optimization

Sheikh, Zakir Ahmad; Singh, Yashwant; Tanwar, Sudeep; Sharma, Ravi; Turcanu, Florin-Emilian; Raboaca, Maria Simona

doi:10.3390/math11010189

Open AccessArticle

EISM-CPS: An Enhanced Intelligent Security Methodology for Cyber-Physical Systems through Hyper-Parameter Optimization

by

Zakir Ahmad Sheikh

¹

,

Yashwant Singh

¹

,

Sudeep Tanwar

^2,*

,

Ravi Sharma

³,

Florin-Emilian Turcanu

⁴ and

Maria Simona Raboaca

^5,6,7,*

¹

Department of Computer Science and IT, Central University of Jammu, Rahya Suchani, Bagla, Samba Jammu 181143, India

²

Department of Computer Science and Engineering, Institute of Technology, Nirma University, Ahmedabad 382481, India

³

Centre for Inter-Disciplinary Research and Innovation, University of Petroleum and Energy Studies, P.O. Bidholi Via-Prem Nagar, Dehradun 248007, India

⁴

Department of Building Services, Faculty of Civil Engineering and Building Services, Technical University of Gheorghe Asachi, 700050 Iași, Romania

⁵

National Research and Development Institute for Cryogenic and Isotopic Technologies—ICSI Rm. Vâlcea, Uz-inei Street, No. 4, P.O. Box 7 Râureni, 240050 Rm. Vâlcea, Romania

⁶

Doctoral School, University Politehnica of Bucharest, Splaiul Independentei Street, No. 313, 060042 Bucharest, Romania

⁷

Faculty of Electrical Engineering and Computer Science, Ștefan cel Mare University, 720229 Suceava, Romania

^*

Authors to whom correspondence should be addressed.

Mathematics 2023, 11(1), 189; https://doi.org/10.3390/math11010189

Submission received: 19 October 2022 / Revised: 30 November 2022 / Accepted: 27 December 2022 / Published: 29 December 2022

(This article belongs to the Special Issue Recent Advances in Security, Privacy, and Applied Cryptography)

Download

Browse Figures

Versions Notes

Abstract

:

The increased usage of cyber-physical systems (CPS) has gained the focus of cybercriminals, particularly with the involvement of the internet, provoking an increased attack surface. The increased usage of these systems generates heavy data flows, which must be analyzed to ensure security. In particular, machine learning (ML) and deep learning (DL) algorithms have shown feasibility and promising results to fulfill the security requirement through the adoption of intelligence. However, the performance of these models strongly depends on the model structure, hyper-parameters, dataset, and application. So, the developers only possess control over defining the model structure and its hyper-parameters for diversified applications. Generally, not all models perform well in default hyper-parameter settings. Their specification is a challenging and complex task and requires significant expertise. This problem can be mitigated by utilizing hyper-parameter optimization (HPO) techniques, which intend to automatically find efficient learning model hyper-parameters in specific applications or datasets. This paper proposes an enhanced intelligent security mechanism for CPS by utilizing HPO. Specifically, exhaustive HPO techniques have been considered for performance evaluation and evaluation of computational requirements to analyze their capabilities to build an effective intelligent security model to cope with security infringements in CPS. Moreover, we analyze the capabilities of various HPO techniques, normalization, and feature selection. To ensure the HPO, we evaluated the effectiveness of a DL-based artificial neural network (ANN) on a standard CPS dataset under manual hyper-parameter settings and exhaustive HPO techniques, such as random search, directed grid search, and Bayesian optimization. We utilized the min-max algorithm for normalization and SelectKBest for feature selection. The HPO techniques performed better than the manual hyper-parameter settings. They achieved an accuracy, precision, recall, and F1 score of more than 98%. The results highlight the importance of HPO for performance enhancement and reduction of computational requirements, human efforts, and expertise.

Keywords:

hyper-parameter optimization; hyper-parameter tuning; cyber-physical systems; machine learning; deep learning; cyber attacks; cyber security; critical infrastructures

MSC:

90C31

1. Introduction

Cyber-physical systems (CPS) are being utilized in critical applications such as smart grids, transportation, avionics, nuclear systems, and many more. Any sort of vulnerability exploitation in these applications relying on fully automated CPS can lead to devastating consequences [1]. For instance, in the past, we have witnessed many CPS security breaches, namely, the DTrack Malware attack at the Kudankulam Nuclear power plant [2], Automotive System Car Jeep Hack [3], the Stuxnet attack [4], the RQ-170 UVA attack [5], Maroochy attack [6], and so on. The researchers have evaluated the feasibility of machine learning (ML) and deep learning (DL) for cyber security in general and the cyber security of CPS in particular. Considering the high data flow of CPS, DL-based approaches ensure greater feasibility and performance [1]. However, the performance of learning models (be it ML or DL) depends on the dataset, model, and model structure (i.e., hyper-parameter settings) [7,8]. A good combination of hyper-parameters not only produces good performance but helps in the reduction of training and testing time. These models perform differently on diversified datasets or applications [9]. Hence some sort of expertise is required in defining the model structure, and lack of expertise renders the model to poor performance [8]. To ease this task, researchers have developed many hyper-parameter optimization (HPO) techniques to find the optimal combination of hyper-parameter values that results in a good performance.

1.1. Hyper-Parameter Optimization (HPO)

The performance of machine learning (ML) and deep learning (DL) models mainly depends on the dataset and the model hyper-parameters. A model performs well when its structure is based on optimal hyper-parameters. Learning models, in general, and deep learning models, in particular, require optimization of hyper-parameters to perform well and speed up the process of training and testing [10]. Hyper-parameters can be categorized into two groups: (a) hyper-parameters that define the model architecture, such as the number of layers, type of layers, etc.; and (b) hyper-parameters that affect the model training, such as batch size, learning rate, number of epochs, etc. Moreover, the Hyper-parameters can be of types continuous, discrete, and categorical [11]. Different learning models possess different hyper-parameters. So, there is a need to consider diversities in models and datasets to optimize hyper-parameters that help in performance enhancement.

Researchers have developed many HPO techniques and frameworks, and each technique possesses some pros and cons. These techniques can be categorized through various aspects and credentials. Based on existing categorizations, we formulate a hybrid categorization as depicted in Figure 1. It includes categorization aspects as model-based, model-free, exhaustive methods, intelligent methods, framework-based, and speed-up criteria based [10,12,13,14,15]. Some techniques fall into more than one and possess many alternate implementation mechanisms. For instance, Bayesian optimization is an exhaustive as well as model-based technique and can be implanted through the utilization of Tree Parzen Estimator (TPE) and Spearmint [14,16].

In manual search, the hyper-parameters are manually selected and thus require significant human expertise to achieve better performance. It requires defining a model by setting hard-coded hyper-parameter values. This approach is usually followed by most people who do not care about the model structures. However, in some instances, the model structure results in poor model performance. Possession of expertise can help achieve good results by adopting this methodology. As mentioned by Nazir et al. in [12], the manual search can have promising results because a human can quickly rule out sub-optimal hyper-parameters with the possession of insights and understanding of the relative hyper-parameters of a model [12]. This approach may not yield good results for users who do not possess significant expertise to guess the best hyper-parameter values. Grid Search is an intuitive and exhaustive solution to test discretized hyper-parameter combination space. This method does not support scalability as the number of combinations rises exponentially with the number of hyper-parameters and is thus rarely used for neural networks [11,12]. With an ‘n’ number of hyper-parameters and each hyper-parameter possessing m’ number of values, the number of possible hyper-parameter combinations (grid size) is

m^{n}

[11]. For instance, if we have 10 hyper-parameters with 5 values per hyper-parameter, the grid size will be

5^{10}

= 9,765,625. In case the number of combinations in each hyper-parameter varies, the grid size is the multiplication of the size of each hyper-parameter. For instance, if

k_{1}, k_{2}, k_{3} \dots, k_{n}

are number of values in hyper-parameter

h_{1}, h_{2}, h_{3} \dots, h_{n}

respectively, then the total grid size is

\prod_{i = 1}^{i = n} k_{i}

.

A grid search evaluates all the possible combinations of hyper-parameters. Thus, it guarantees the evaluation of optimal solutions out of possible combinations but requires greater computational power [12,17]. Random search experiments are more efficient because not all hyper-parameters are equally essential to tune [9,18]. Moreover, grid search explores too many unimportant trials and suffers from poor coverage in essential dimensions. A critical property of the random search approach is that it does not confine itself to exploring grid points only but chooses random values out of the hyper-parameter space defined. This property of random search lets it obtain optimal values that were unexplored in grid search [18].

Bayesian optimization is also known as sequential model-based optimization [19]. It is based on the Bayes’ theorem and uses a surrogate function, which estimates the objective function [20]. It is beneficial to use Bayesian optimization when complex, noisy, and challenging functions are evaluated. It is more efficient and takes less time as it keeps track of past optimization results. Thus, this informed method of optimization guarantees better performance [20]. Bayesian optimization is a form of sequential model-based optimization and uses the results of the previous iteration to improve the sampling method for the next iteration [21]. It uses a probabilistic model to model the objective function based on observed data points. It uses an acquisition function based on the current model that trades off exploration and [16,22]. Based on these credentials, it invokes three iterations: (a) selection of point that maximizes acquisition function, (b) evaluation of objective function, and (c) data augmentation and model refitting. There exist many acquisition functions, but the expected improvement (EI) is a commonly used one among all and is defined as Equation (1) [22]:

E I = E [\max (f_{m i n} - f, 0)]

(1)

Tree Parzen Estimator is a well-known Bayesian optimization method that models densities over the input configuration space instead of modeling the objective function through the use of a kernel density estimator [16].

1.2. Contributions

This paper proposes an enhanced DL-based security mechanism for CPS and ensures the model’s performance through Feature Selection and specific HPO techniques. Moreover, we focus on the reduction of training time and testing time. We evaluate the performance of an IoT-based CPS dataset, namely the TON IoT train-test network dataset [23,24]. Based on these criteria, we incorporate the Table 1 methods in our learning-based security framework to ensure the security of CPS.

1.3. Novelty

The performance and computational requirements of intelligent or learning models depend on the dataset and the model itself. It is more challenging to define the structure of a deep learning model than shallow learning because of the presence of more hyperparameters in the former. The deep learning models also take larger training times than the shallow learning models, so it is difficult to define, train and then redefine and retrain the model through manual settings and requires greater expertise to fine-tune models. In comparison to the existing works, our paper has a unique contribution in terms of various aspects. To specifically justify the novelties of our work, we can correlate our work with some similar existing works. Aledhari et al. [20] have implemented Bayesian optimization and random search on shallow learning models with fewer hyper-parameters than deep learning models. We have considered deep learning-based ANN, wherein defining model hyper-parameters is more challenging. Moreover, with Bayesian optimization and random search, we have also implemented manual search and directed grid search and made comparative analysis in terms of performance parameters and computational requirements. The presence of other factors affecting performance and computational requirements, such as normalization and feature selection, has also been considered. Khan et al. [25] have implemented the Opt-aiNet HPO framework and evaluated the performance of some shallow learning models but considered a single performance parameter, i.e., accuracy (AC). In a similar work, Purohit et al. [26] utilized an automated procedure of HPO on a deep auto-encoder Gaussian mixture model (DAGMM) for anomaly detection. Compared to papers [25,26], our exhaustive HPO-implemented methods outperform their performance.

Our work relies on the evaluation and enhancement of performance and computational requirements of an ANN under different settings and highlights the factors affecting the same. It evaluates and highlights the importance of normalization, feature engineering, and HPO. Specifically, random search, directed grid search, and Bayesian optimization have been implemented to enhance the performance of ANN for the security of CPS. Considering the complexity of grid search, a directed grid search is implemented, which fed the search space identified through random search instead of initializing the new and blind search space and thus, minimizing the requirement of human expertise. The best hyper-parameters in different settings have been evaluated, which provides the information of more useful hyper-parameter values in each case. The factors considered have been evaluated to assess their importance in enhancing performance and reducing computational requirements in a constrained CPS domain and provide a future direction in building an intelligent security model.

1.4. Related Works

The use of the HPO techniques has shown promising contributions in enhancing the performance of intelligent models based on machine learning and deep learning. The effectiveness varies across models and datasets. Based on some existing works, irrespective of the dataset or application, we have observed a performance enhancement through the utilization of HPO [8,27]. Many HPO techniques and frameworks have been developed by researchers for model performance enhancement, such as random search, grid search, Bayesian optimization, smart frameworks, and so on [10,12,13,14]. We have analyzed enhanced performance based on some existing works that utilize HPO. Some of the related existing works have been summarized in Table 2.

2. Problem Formulation and Methodology

2.1. HPO Problem Formulation

Let

M_{H P}

be a machine learning or deep learning model with

H P

as model hyper-parameters such that

H P = h_{1}, h_{2}, \dots, h_{n}

. Considering the model with

n

hyper-parameters, each hyper-parameter possesses a range of possible values defined as the search space. The search space values and their type depend on the hyper-parameter itself. For instance, a range of positive integers can be a search space for epochs, hyper-parameter, and the number of iterations. A range of floating-point values can be a search space for the learning rate hyper-parameter, and some well-defined string values can be a search space for optimizing the hyper-parameter. So, in general, a model can be defined as

M (h_{1} : [s e a r c h s p a c e 1], h_{2} : [s e a r c h s p a c e 2], \dots, h_{n} : [s e a r c h s p a c e 3])

and the problem of hyper-parameter optimization is to derive a model M with dedicated hyper-parameter values as

M (h_{1} : v a l u e_{1}, h_{2} : v a l u e_{2}, \dots, h_{n} : v a l u e_{n})

that provides the enhanced model performance (for instance, accuracy) and minimized loss-function.

To find the optimal hyper-parameter value, there is

X

number of possible combinations (also known as the hyper-parameter space or hypercube) of hyper-parameter values and

X = s i z e o f (s e a r c h s p a c e 1 * s e a r c h s p a c e 2 * \dots * s e a r c h s p a c e n)

. For example, consider a model with 3 hyper-parameters, namely epochs, learning_rate, and optimizer as:

M (e p o c h s : [10, 15, 20, 30], l e a r n i n g_r a t e : [0.1, 0.2], o p t i m i z e r : [' a d m',' R M S P r o p'])

(2)

The hyper-parameter epochs have a search space of size 4, the learning rate has a size of 2, and the optimizer has a size of 2. So, the size of the hypercube

X = 4 * 2 * 2 i.e. 16 . X

is the hyper-parameter space (or hypercube) where each dimension defines a hyper-parameter. A typical hypercube of 2 hyper-parameters and 3 combinations per hyper-parameter gives

3^{2} = 9

combinations. Let

x

be a hyper-parameter combination point that belongs to

X

, i.e,

x \in X

.

y

or

f (x)

is a performance parameter (usually a loss function) of a model defined with hyper-parameter combination

x

. So, the problem of HPO can be defined as a function

f

minimizing output function (i.e., loss function) by considering and evaluating hyper-parameter combination points

x

from the hyper-parameter space or hypercube

X

and returning the minimized combination

x^{*}

as in Equation (3) [11]:

x^{*} = \min_{x \in X} f (x)

(3)

2.2. Proposed Methodology

Considering the CPS domain constraints and intelligent security model performance requirements, we assess the factors affecting the performance, such as data normalization, feature selection, and HPO. The importance of their absence and presence is evaluated on a CPS-based dataset to build an effective intelligent security model for CPS and guide a future perspective for building similar ones. Our proposed methodology is based on the utilization of HPO for performance enhancement in the CPS domain, as shown in Figure 2. Initially, we perform data pre-processing to make the dataset ready for evaluation. The non-numeric values are converted into numeric ones using LabelEncoder. To speed up the training process, we normalize the data using a min-max scaler from 0 to 1. In some cases, we utilize the chi-square (i.e., chi2) based SelectKBest feature selection methodology on excluding input features with the least correlation with the output features. Once the data is pre-processed, we split the dataset into training and testing sets at 80% and 20%, respectively [29]. Out of the training set, we also use the k-fold cross-validation (CV) technique. Because when we evaluate the model on a testing set instead of a validation set, we lose the ability to evaluate the model performance on unseen data honestly; this is known as data leakage. To refrain from data leakage, we split our dataset into the training set and testing set, wherein the training set contains training and validation data [21]. In k-fold CV, the training set is split into k-folds, and in each iteration, one fold is kept as the validation portion and the remaining as the training portion. Then we train our ANN model on the training set, evaluate it on the validation set, and finally test it on the testing set at the end of model development. In our methodology, we utilize HPO techniques such as manual search, random search, directed grid search, and Bayesian optimization to evaluate ANN’s performance and computational requirements under different settings. To proceed with the HPO techniques, a hyper-parameter search space or hyper-cube is defined for an ANN. Then it is fed to an HPO algorithm which evaluates the performance of ANN under different combinations of hyper-parameters from the given search space. The selection of hyper-parameter combinations from the search space depends on the HPO algorithm. The grid search algorithm, which is considered the most exhaustive algorithm, evaluates all the hyper-parameter combinations of search space. Based on the HPO techniques implemented, we obtain a set of best hyper-parameters that define our proposed ANN. Under different settings through HPO, we evaluate each case’s performance and computational requirements on the training and testing set.

3. Results and Discussion

3.1. Dataset Consideration

For implementation purposes, we considered the existing TON IoT train-test network dataset, which contains 461,043 entries with 45 features [23,24]. The dataset possesses both regular and attacks entries. It contains nine types of attacks: password, ransomware, scanning, XSS, backdoor, DDoS, DOS, injection, and MITM [23]. Moustafa et al. [24] mention 46 features of the Train-Test Network dataset but feature number 34 (i.e., http_referrer) has been removed from the current version of the same dataset. The dataset statistics are depicted in Figure 3.

The TON IoT train-test network dataset contains nine types of attack entries and normal dataflow entries. Out of 461,043 entries, 300,000 entries are normal flow contributing to about 65% of the overall dataset. In case of attacks, attacks such as password, ransomware, scanning, XSS, injection, backdoor, DDOS, and DOS possess 20,000 entries individually, each contributing about 5% to the overall dataset. The MITM attack possesses the least number of entries, i.e., 1043, and contributes only 0.226% to the overall dataset. The dataset contains 45 features with 43 input features and 2 labeled output features. The service profile of features is mentioned in Table 3 [24].

3.2. Evaluations

As deep learning models consume more time for training and testing, a need arises to use the optimal model structure for enhanced performance and reduced wastage of effort and computational power. In this paper, we evaluate the ANN performance on a TON IoT train-test network dataset. The implementations were performed on an i3 laptop possessing the configurations mentioned in Table 4. We used 80% of the dataset for training and 20% for testing purposes. Initially, we assess its performance by using hardcoded model hyper-parameters and structure. Then we also performed hyper-parameter optimization by utilizing methods such as random search, grid search, and Bayesian optimization. In each method, we used binary_crossentropy as the loss function, accuracy as the primary performance metric, Relu as the activation function at hidden layers, and the sigmoid activation function at the output layer. Moreover, we used one node strategy, i.e., a single node (neuron) is implemented at the output layer. Each of the performance evaluation methods implemented is discussed in the following subsections.

3.2.1. Manual Search

In the manual search, we utilized hardcoded model hyper-parameters and ran the model for three combinations. We implemented a four-layer ANN model wherein the four layers are the input layer, hidden layer 1, hidden layer 2, and output layer. The three cases of manual search are discussed as follows:

Case 1:

In this case, we did not perform data normalization and feature selection. Thus, the input dataset contains 43 features to be processed. Accordingly, we defined a four-layer ANN model with 43, 14, 24, and 1 neuron(s), respectively, as depicted in Figure 4.

In this case, we defined the model with hardcoded hyper-parameters such as optimizer as Adam, epochs as 50, and batch_size as 20. Once we executed our model, it took 31 min and 50 s for training and 6 s for testing. Once the model was trained thoroughly, we assessed its performance, and it resulted in an accuracy of 65%, a precision of 77%, a recall of 65%, and an F1 score of 51%.

2.: Case 2:

In this case, we did not perform data normalization but implemented a feature selection. We used the chi-square (i.e., chi2) test through the SelectKBest method and selected the 25 best features for training purposes. Based on that, we modified our 4-layer ANN model, and the number of neurons was 25, 14, 8, and 1, respectively (from input to output layer), as depicted in Figure 5.

In this case, the hardcoded hyper-parameter values for the optimizer, epochs, and batch_size were set to Adam, 40, and 20, respectively. This model took 23 min and 10 s for training, and 8 s for testing. We analyzed the reduction in training time through the inclusion of feature selection and the reduction in the number of epochs. However, the model resulted in the same performance as achieved in case 1 of manual search, i.e., an accuracy of 65%, a precision of 77%, a recall of 65%, and an F1 score of 51%.

3.: Case 3:

In this case, the number of layers and neurons in each layer were kept the same as in case 2 (also depicted in Figure 5). Moreover, we performed the normalization on a scale of 1 to 10 and utilized the 25 best features through the SelectKBest feature selection strategy. We defined the hardcoded hyper-parameter values for the optimizer, epochs, and batch_size with values Adadelta, 30, and 30, respectively. Once we evaluated the model, it showed an enhanced performance and a reduction in training time. The model took 13 min and 8 s for training, and 5 s for testing. It achieved an accuracy of about 89%, a precision of 90%, a recall of 89%, and an F1 score of 89%.

Based on our hyper-parameter combinations, we found it difficult to guess the optimal values for each hyper-parameter that could result in optimal model performance. Considering the same, we implemented some hyper-parameter optimization (also known as hyper-parameter tuning) methods such as random search, grid search, and Bayesian optimization to obtain optimal model performance. In each of these optimization techniques, we defined a 4-layer ANN with 25, 14, 8, and 1 neuron(s) from the input layer to the output layer, respectively. We optimized three important hyper-parameters, namely optimizer, epochs, and batch_size. Additionally, we considered the data normalization and feature selection and performed two-fold CV in each hyper-parameter optimization methodology to avoid any data leakage [30].

3.2.2. Random Search

Random search is a quick technique to find the optimal hyper-parameters out of a hypercube or search space. A hypercube contains all the possible hyper-parameter combinations of defined hyper-parameters. It randomly chooses a hyper-parameter combination from a hypercube and evaluates the same. We performed 5 random iterations to find the optimal hyper-parameter combination out of the hypercube of size 45 defined under optimizer = [‘Adam’,’RMSprop’,SDG’,’Adagrad’,’Adadelta’], epochs = [20, 60, 90], and batch_size = [10, 30, 40]. The results achieved through random search are mentioned in Table 5.

In random search, we achieved an accuracy of about 98.70% with the best hyper-parameter combination: optimizer as Adam, epochs as 90, and batch_size as 10. With Adam optimizer and 90 epochs, the batch_size of 40 also performed well with an average accuracy of about 98.20%. The model took 216 min and 36 s to evaluate. With best-evaluated hyper-parameters, we tested the model performance and achieved an accuracy of 98% and 99% in precision, recall, and F1 score.

3.2.3. Directed Grid Search

The problem with random search is that it only evaluates the fixed number of randomly chosen hyper-parameter combinations. So, there is a chance of missing the evaluation of optimal hyper-parameter combinations from a given hypercube. On the other hand, grid search is an exhaustive method to evaluate the whole hypercube. So, it is time-consuming to evaluate through grid search. By utilizing human expertise, we reduce the hypercube size by using information obtained from manual search and random search analysis. As per the analysis of random search, we saw the best two hyper-parameter combinations use Adam as the optimizer and 90 as the number of epochs. We also observed through the manual search that the greater the batch_size, the smaller the time required for training. This involves a trade-off between time and performance as per random search results. However, there is a slight reduction in performance with a change in batch_size. Based on these factors, we reduced the hypercube size to 6 (i.e., 2·3·1) with the hyper-parameter spaces as epochs = [100, 110], batch_size = [50, 30, 70], and optimizer = [‘Adam’] [31]. We call this methodology directed grid search. The results achieved through directed grid search are mentioned in Table 6.

The grid search took 168 min and 6 s for evaluation and achieved an average accuracy of 99.15%. With the best hyper-parameter combination possessing the optimizer as Adam, epochs as 100, and a batch_size of 30, the model achieved an accuracy of 99.31%, precision of 99.32%, and a recall and F1 score of 99.31%.

3.2.4. Bayesian Optimization

Bayesian optimization is an informed and sequential model-based optimization method that uses the results of previous iterations to improve. Thus, this informed method guarantees better performance [20,21]. For our security model, we implemented Bayesian optimization using Hyperopt, a python-based library that supports parallelization [19]. For the same, we defined a hypercube of size to 18 (i.e., 3*3*2) with the hyper-parameter spaces as epochs = [70, 90, 120], batch_size = [40, 60, 80], and optimizer = [‘Adam’, ‘Adagrad’]. We utilized two-fold cross-validation to refrain from data leakage and defined 5 as the maximum iteration (i.e., max_iter). The Bayesian optimization took 156 min for evaluation and achieved an average accuracy of 99.13%, considering feature selection and normalization. With the best hyper-parameter combination possessing the optimizer as Adam, epochs as 120, and a batch_size of 80, the model achieved an accuracy, precision, recall, and F1 score of 99.13%. The results obtained in all the implemented techniques have been summarized in the Table 7.

From Table 7, we infer that, on average, each hyper-parameter evaluation through RS takes about 43 min, DGS takes about 28 min, and BO takes 31 min. Considering the average time consumption of uninformed search techniques implemented, namely RS and DGS, theoretically, the GS-based uninformed search technique could take about 1948 min and 1260 min, respectively, to evaluate the whole hypercube. Many attributes related to the time consumption of various HPO techniques have been presented in Table 8. GS being a very exhaustive technique could be difficult to evaluate on resource-constrained devices or systems.

From Table 8, it can be inferred that the RS took about 43 min on average to evaluate each hyper-parameter iteration, DGS took 28 min, and both RS and minimized GS for DGS took 34 min. Considering the time required for RS and DGS, we can theoretically evaluate the time required for GS as 1948 min and 1260 min, respectively. Thus, GS being a very exhaustive technique could be difficult to evaluate on resource constraint devices or systems. On the other hand, DGS takes 168 min to evaluate the minimized search space, and the RS method used to identify the minimized search space takes 216 min. Collectively, both RS and DGS consume 384 min, which is less than the time required for GS to evaluate the whole non-minimized search space. On the other hand, BO, as an informed search, took an average of 31 min per iteration and achieved performance equivalent to DGS in just five iterations.

Comparing the optimization results, we can observe from Figure 6 that random search, directed grid search, and Bayesian optimization performed better than the three manual search strategies. Hence, we can say that the involvement of optimization techniques and expertise is crucial for enhancing the performance of deep learning models.

4. Conclusions

The feasibility of ML and DL-based intelligent methods for the security of CPS invoke the difficulty of selecting model hyper-parameters that strongly depend on model performance. Considering the lack of expertise in defining model hyper-parameters, HPO techniques can be utilized to evaluate the best model hyper-parameters on diversified applications or datasets at the cost of computational power. In this paper, we evaluated the performance of ANN on a TON IoT network-based CPS dataset using three manual hard-coded hyper-parameter settings and exhaustive HPO techniques. More specifically, we implemented random search, directed grid search, and Bayesian optimization out of exhaustive methods. These HPO techniques performed better as compared to the three manual hyper-parameter settings. The best performance in manual search is 89.58% accuracy, 90.11% precision, 89.58% recall, and 89.70% F1 score. On the other hand, the accuracy achieved through exhaustive HPO is 98.71%, 99.31%, and 99.13% by random search, directed grid search, and Bayesian optimization, respectively. Moreover, all the HPO techniques achieved a precision, recall, and F1 score of more than 99%, which is better than a manual search. The grid search algorithm is more computationally expensive than the random search, but the latter does not always obtain optimal performance. The collaborative use (i.e., directed grid search) of these two could speed up the process of identifying optimal search space through random search and then evaluate the whole identified search space through grid search. In contrast, Bayesian optimization is an informed search algorithm that considers the results of previous iterations, thus making better decisions.

In the future, we will evaluate the performance of more learning models through exhaustive HPO techniques and other techniques on diversified datasets and real-time applications. Moreover, we intend to utilize the concept of transfer learning (adaptive learning) to facilitate the capability of enhanced learning models for similar applications with slightly different data flows.

Author Contributions

Conceptualization, Y.S., Z.A.S. and S.T.; writing—original draft preparation, Z.A.S., M.S.R., F.-E.T. and Y.S.; methodology, Y.S., R.S. and S.T.; writing—review and editing, S.T., M.S.R., Z.A.S. and R.S.; software, Y.S., R.S. and S.T.; visualization, F.-E.T., M.S.R. and R.S.; investigation, R.S., Z.A.S. and S.T. All authors have read and agreed to the published version of the manuscript.

Funding

This paper was partially supported by UEFISCDI Romania and MCI through BEIA projects AutoDecS, SOLID-B5G, T4ME2, DISAVIT, PIMEO-AI, AISTOR, MULTI-AI, ADRIATIC, Hydro3D, PREVENTION, DAFCC, EREMI, ADCATER, MUSEION, FinSESCo, iPREMAS, IPSUS, U-GARDEN, CREATE and by European Union’s Horizon 2020 research and innovation program under grant agreements No. 101073879 (FLEXI-cross). The results were obtained with the support of the Ministry of Investments and European Projects through the Human. Also, this work is also supported by a grant from the Gheorghe Asachi Technical University of Iași: postdoctoral research—2022 and POCU—InoHubDoc Capital Sectoral Operational Program 2014–2020, Contract no. 62461/3 June 2022, SMIS code 153735. This work is supported by Ministry of Research, Innovation, Digitization from Romania by the National Plan of R & D, Project PN 19 11, Subprogram 1.1. Institutional performance-Projects to finance excellence in RDI, Contract No. 19PFE/30 December 2021 and a grant of the National Center for Hydrogen and Fuel Cells (CNHPC)—Installations and Special Objectives of National In-terest (IOSIN).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Li, J.; Yang, Y.; Sun, J.S.; Tomsovic, K.; Qi, H. ConAML: Constrained Adversarial Machine Learning for Cyber-Physical Systems; Association for Computing Machinery: New York, NY, USA, 2021; Volume 1. [Google Scholar]
Dilipraj, E. Supposed Cyber Attack on Kudankulam Nuclear Infrastructure—A Benign Reminder of a Possibile Reality. Cent. Air Power Stud. 2019, 129, 1–5. [Google Scholar]
Greenberg, A. Hackers Remotely Kill a Jeep on the Highway—With Me in It. 2015. Available online: https://www.wired.com/2015/07/hackers-remotely-kill-jeep-highway/ (accessed on 29 July 2021).
Ahmed, C.M.; Zhou, J. Challenges and Opportunities in Cyberphysical Systems Security: A Physics-Based Perspective. IEEE Secur. Priv. 2020, 18, 14–22. [Google Scholar] [CrossRef]
Hartmann, K.; Steup, C. The vulnerability of UAVs to cyber attacks—An approach to the risk assessment. In Proceedings of the 5th International Conference on Cyber Conflict, Tallinn, Estonia, 4–7 June 2013; Volume 1, pp. 1–23. [Google Scholar]
Dibaji, S.M.; Pirani, M.; Flamholz, D.B.; Annaswamy, A.M.; Johansson, K.H.; Chakrabortty, A. A systems and control perspective of CPS security. Annu. Rev. Control 2019, 47, 394–411. [Google Scholar] [CrossRef] [Green Version]
Hutter, F.; Lücke, J.; Schmidt-Thieme, L. Beyond Manual Tuning of Hyperparameters. KI Kunstl. Intelligenz 2015, 29, 329–337. [Google Scholar] [CrossRef]
Technology, C. Hepatitis Dataset Imputing Missing Values Data Transformation Training Dataset Training Dataset Grid Search Model Classification Report Best Parameters. FUDMA J. Sci. 2021, 5, 447–455. [Google Scholar]
Mantovani, R.G. Effectiveness of Random Search in SVM hyper-parameter tuning. In Proceedings of the 2015 International Joint Conference on Neural Networks, Killarney, Ireland, 12–17 July 2015; 1, pp. 1–8. [Google Scholar]
Lorenzo, P.R.; Nalepa, J.; Kawulok, M.; Ramos, L.S.; Pastor, J.R. Particle swarm optimization for hyper-parameter selection in deep neural networks. In Proceedings of the Genetic and Evolutionary Computation Conference, Berlin, Germany, 19 July 2017; pp. 481–488. [Google Scholar]
Bertrand, H. Hyper-Parameter Optimization in Deep Learning and Transfer Learning: Applications to Medical Imaging. Ph.D. Thesis, Université Paris-Saclay, Paris, France, 2019. [Google Scholar]
Nazir, S. Assessing Hyper Parameter Optimization and Speedup for Convolutional Neural Networks. Int. J. Artif. Intell. Mach. Learn. 2021, 10, 1–17. [Google Scholar] [CrossRef]
Yu, T.; Zhu, H. Hyper-Parameter Optimization: A Review of Algorithms. arXiv 2020, arXiv:2003.05689. [Google Scholar]
Evaluating Machine Learning Models using Hyperparameter Tuning. 2021. Available online: https://www.analyticsvidhya.com/blog/2021/04/evaluating-machine-learning-models-hyperparameter-tuning/ (accessed on 15 March 2022).
Li, L.; Talwalkar, A. Random Search and Reproducibility for Neural Architecture Search. In Proceedings of the 6th ICML Workshop on Automated Machine Learning, Tel Aviv, Israel, 22–25 July 2019; Volume 6, pp. 1–20. [Google Scholar]
Falkner, S.; Klein, A.; Hutter, F. BOHB: Robust and Efficient Hyperparameter Optimization at Scale. In Proceedings of the 35th International Conference on Machine Learning, ICML, Jinan, China, 19–21 May 2018; 4, pp. 2323–2341. [Google Scholar]
Badvelu, J. Hyperparameter Tuning for Machine Learning Models. 2020. Available online: https://towardsdatascience.com/hyperparameter-tuning-for-machine-learning-models-1b80d783b946. (accessed on 5 April 2022).
Bergstra, J.; Bengio, Y. Random search for hyper-parameter optimization. J. Mach. Learn. Res. 2012, 13, 281–305. [Google Scholar]
Bergstra, J.; Komer, B.; Eliasmith, C.; Yamins, D.; Cox, D.D. Hyperopt: A Python library for model selection and hyperparameter optimization. Comput. Sci. Discov. 2015, 8, 1–24. [Google Scholar] [CrossRef]
Aledhari, M.; Razzak, R.; Parizi, R.M. Machine learning for network application security: Empirical evaluation and optimization. Comput. Electr. Eng. 2021, 91, 107052. [Google Scholar] [CrossRef]
Jordan, J. Hyperparameter Tuning for Machine Learning Models. 2017. Available online: https://www.jeremyjordan.me/hyperparameter-tuning/ (accessed on 15 March 2022).
Gressling, T. Automated machine learning. In Data Science in Chemistry; Walter de Gruyter: Berlin, Germany, 2020. [Google Scholar] [CrossRef]
Alsaedi, A.; Moustafa, N.; Tari, Z.; Mahmood, A.; Anwar, A. TON-IoT telemetry dataset: A new generation dataset of IoT and IIoT for data-driven intrusion detection systems. IEEE Access 2020, 8, 165130–165150. [Google Scholar] [CrossRef]
Moustafa, N. A new distributed architecture for evaluating AI-based security systems at the edge: Network TON_IoT datasets. Sustain. Cities Soc. 2021, 72, 102994. [Google Scholar] [CrossRef]
Khan, F.; Kanwal, S.; Alamri, S.; Mumtaz, B. Hyper-Parameter Optimization of Classifiers, Using an Artificial Immune Network and Its Application to Software Bug Prediction. IEEE Access 2020, 8, 20954–20964. [Google Scholar] [CrossRef]
Purohit, H.; Tanabe, R.; Endo, T.; Suefusa, K.; Nikaido, Y.; Kawaguchi, Y. Deep Autoencoding GMM-Based Unsupervised Anomaly Detection in Acoustic Signals and Its Hyper-Parameter Optimization. In Proceedings of the Fifth Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE 2020), Tokyo, Japan, 2–4 November 2020; pp. 175–179. [Google Scholar] [CrossRef]
Jahan, I.; Habiba, U.; Muntasir, M.; Al-monsur, A.; Mohammad, A.; Faisal, F.; Ridwan, M. Survival Prediction of Children Undergoing Hematopoietic Stem Cell Transplantation Using Different Machine Learning Classifiers by Performing Chi-squared Test and Hyper-parameter Optimization: A Retrospective Analysis. In Computational and Mathematical Methods in Medicine; Hindawi Limited: London, UK, 2011. [Google Scholar] [CrossRef]
Jervis, M.; Liu, M.; Li, W.; Smith, R. Deep Learning Network Optimization and Hyper-parameter Tuning. SEG Tech. Progr. Expand. Abstr. 2019, 40, 2283–2287. [Google Scholar]
Khatri, S.; Vachhani, H.; Shah, S.; Bhatia, J.; Chaturvedi, M.; Tanwar, S.; Kumar, N. Machine learning models and techniques for VANET based traffic management: Implementation issues and challenges. Peer Peer Netw. Appl. 2021, 14, 1778–1805. [Google Scholar] [CrossRef]
Muralidhar, K. How to prevent Data Leakage while evaluating the performance of a Machine Learning model. 2021. Available online: https://towardsdatascience.com/how-to-avoid-data-leakage-while-evaluating-the-performance-of-a-machine-learning-model-ac30f2bb8586 (accessed on 2 April 2022).
Jadav, N.K.; Gupta, R.; Alshehri, M.D.; Mankodiya, H.; Tanwar, S.; Kumar, N. Deep Learning and Onion Routing-Based Collaborative Intelligence Framework for Smart Homes Underlying 6G Networks. IEEE Trans. Netw. Serv. Manag. 2022, 19, 3401–3412. [Google Scholar] [CrossRef]
Patel, K.; Mehta, D.; Mistry, C.; Gupta, R.; Tanwar, S.; Kumar, N.; Alazab, M. Facial Sentiment Analysis Using AI Techniques: State-of-the-Art, Taxonomies, and Challenges. IEEE Access 2020, 8, 90495–90519. [Google Scholar] [CrossRef]

Figure 1. Different hyper-parameter optimization techniques.

Figure 2. Methodology utilized for performance evaluations.

Figure 3. TON IoT train-test network dataset statistics.

Figure 4. ANN model structure used in manual search case 1.

Figure 5. ANN model structure used in manual search case 2.

Figure 6. Performance comparison of hardcoded model selection and exhaustive hyper-parameter optimization methods implemented [32].

Table 1. Paper contributions and methods implemented.

Criteria	Purpose	Method(s)
Artificial Neural Network (ANN)	Security of CPS	A multilayer ANN under manual hyper-parameter settings The use of data normalization and feature selection to speed up the process of ANN training and testing The use of HPO to enhance the performance of ANN for detecting CPS breaches
Optimization	Increased performance Reduce wastage of computational power and efforts	Feature selection (SelectKBest) Normalization (min-max algorithm) Hyper-parameter optimization (manual, random search, directed grid search, and Bayesian optimization)

Table 2. Impact of HPO in learning models based on existing works.

Reference(s)		HPO Method	Model	Dataset	AC	PR	RC	F1
Aledhari et al. [20]	Network Application Security	Bayesian Optimization	DT	KDD-Cup 99	99.37	99.41	99.42	99.41
				NSL-KDD	99.67	99.67	99.71	99.69
				ADFA IDS 2017	90.82	90.34	92.90	90.87
			RF	KDD-Cup 99	92.16	99.13	98.60	93.04
				NSL-KDD	94.56	92.73	98.73	95.04
				ADFA IDS 2017	85.25	79.51	95.04	86.57
		Random Search	DT	KDD-Cup 99	99.37	99.41	99.41	99.42
				NSL-KDD	99.67	99.67	99.71	99.69
				ADFA IDS 2017	90.82	90.34	92.90	90.87
			RF	KDD-Cup 99	92.09	99.13	98.69	92.97
				NSL-KDD	94.56	92.73	98.78	95.04
				ADFA IDS 2017	84.99	79.17	95.04	86.36
Khan et al. [25]	Bug Prediction	Opt-aiNet	SVM	Eclipse JDT Core	85.16	---	---	---
			KNN		85.84	---	---	---
			DT		83.46	---	---	---
			RF		86.89	---	---	---
			AdaBoost		85.75	---	---	---
Purohit et al. [26]	Anomaly Detection	Automated Procedure	DAGMM	Experimental Data of Industrial Fans	94	93	94	96
Jervis et al. [28]	Seismic Facies Characterization	Random Search and Directed Random Search (Very Fast Simulated Annealing)	CNN	3D Seismic Data	82	---	---	---
Bolatito et al. [8]	Identification of hepatitis disease	Grid Search	KNN	Hepatitis Dataset of 155 patients	80	82	80	---
			NB		84	86	90	---
			SVM		90	88	100	---
			LR		88	80	100	---
			DT		78	80	90	---

Table Legend: DT: Decision Tree; RF: Random Forest; SVM: Support Vector Machine; KNN: K-Nearest Neighbors; NB: Naive Bayes; DAGMM: Deep Auto-Encoder Gaussian Mixture Model; CNN: Convolutional Neural Network; LR: Logistic Regression.

Table 3. Service profile of TON IoT train-test network dataset features [24].

Feature Range	Feature(s) Purpose in Model	Service Profile
1 to 12	Input Features	Connection activity
13 to 16		Statistical activity
17 to 24		DNS activity
25 to 30		SSL activity
31 to 40		HTTP activity
41 to 43		Violation activity
44 to 45	Output Features	Data labeling

Table 4. System configurations for implementation.

Attribute	Value
Laptop Vendor	HP
Computing System	Intel(R) Core(TM) i3
Processor Generation	4th
Processor Type	CPU
System Clock Speed	1.70 GHz
Operating System	Windows 8.1
System RAM	8 GB
Systems Type	64 bit

Table 5. Random search evaluation results.

Iteration Number	CV Step	Optimizer	Epochs	Batch Size	Time Consumption (minutes)	Accuracy (percentage)
1	1	Adadelta	20	10	14.6	87.40
1	2	Adadelta	20	10	14.9	86.90
2	1	Adam	90	40	14.9	98.00
2	2	Adam	90	40	13.3	98.40
3	1	Adagrad	20	40	3.6	91.00
3	2	Adagrad	20	40	3.4	89.60
4	1	Adam	90	10	56.8	98.70
4	2	Adam	90	10	59.8	98.70
5	1	Adadelta	90	40	15.4	89.00
5	2	Adadelta	90	40	19.9	91.10

Table 6. Grid search evaluation results.

Iteration No.	CV Step	Optimizer	Epochs	Batch Size	Time Consumption (Minutes)	Accuracy (Percentage)
1	1	Adam	100	50	11.0	99.20
1	2		100	50	12.8	98.30
2	1		100	30	21.2	99.30
2	2		100	30	21.8	99.00
3	1		100	70	7.8	98.50
3	2		100	70	8.2	98.40
4	1		110	50	13.7	98.40
4	2		110	50	12.8	98.60
5	1		110	30	20.9	98.70
5	2		110	30	20.5	98.70
6	1		110	70	8.2	98.50
6	2		110	70	9.1	98.50

Table 7. Performance of deep learning model with manual hyper-parameter selection and through the utilization of exhaustive hyper-parameter optimization techniques.

Model, Hyper-Parameter Optimization (HPO), and Model Structure					Feature Engineering		Time Consumption		Performance
M^O	H^T	H^C	B^H	M^S	N^M	F^S	TR^T	TS^T	A^C	P^R	R^C	F¹
ANN	MS	epochs = 50, batch_size = 20, optimizer = ‘Adam’		43→14→24→1	No	No	31 m 50 s	6 s	65.18	77.00	65.00	51.00
		epochs = 40, batch_size = 20, optimizer = ‘Adam’		25→14→8→1	No	Yes	23 m 10 s	8 s	65.18	77.00	65.00	51.00
		epochs = 30,batch_size = 30,optimizer = ‘Adadelta’		25→14→8→1	Yes	Yes	13 m 8 s	5 s	89.58	90.11	89.58	89.70
	RS (n_iter = 5, cv = 2)	epochs = [20, 60, 90]batch_size = [10, 30, 40]optimizer = [‘Adam’,‘RMSprop’, ‘SDG’, ‘Adagrad’, ‘Adadelta’]	epochs = 90, batch_size = 10, optimizer = ‘Adam’	25→14→8→1	Yes	Yes	216 m 36 s	5 s	98.71	99.06	99.06	99.06
	DGS(cv = 2)	epochs = [100, 110], batch_size = [50, 30, 70], optimizer = [‘Adam’]	epochs = 100, batch_size = 30, optimizer = ‘Adam’	25→14→8→1	Yes	Yes	168 m 6 s	5 s	99.31	99.32	99.31	99.31
	BO(cv = 2, max_iter = 5)	epochs = [70, 90, 120], batch_size = [40, 60, 80]), optimizer = [‘Adam’, ’Adagrad’])	epochs = 120, batch_size = 80, optimizer = ‘Adam’	25→14→8→1	Yes	Yes	156 m	3 s	99.13	99.13	99.13	99.13

Table Legend: MS: Manual Search; DGS: Directed Grid Search; RS: Random Search; BO: Bayesian Optimization; M^O: Model Used; H^T: HPO Technique; H^C: Hyper-parameters Considered; B^H: Best Hyper-parameters; M^S: Model Structure; N^M: Normalization; F^S: Feature Selection; TR^T: Training Time; TS^T: Testing Time; A^C: Accuracy; P^R: Precision; R^C: Recall; F¹: F1 score; cv: Cross Validation.

Table 8. Training time consumption by various HPO techniques.

H^T	A	T^T	H^S	N^I	T^T	A^T
RS	Implemented	Uninformed	45	5	216 m 36 s	43 m 19 s
DGS		Uninformed	6	6	168 m 6 s	28 m 1 s
RS + DGS		Uninformed	51	11	384 m 42 s	34 m 57 s
BO		Informed	18	5	156 m	31 m 12 s
GS Case 1	Theoretical Consideration	Uninformed	45	45	1948 m 30 s	43 m 19 s
GS Case 2	Theoretical Consideration	Uninformed	45	45	1260 m 45 s	28 m 1 s

Table Legend: H^T: HPO Technique, A: Aspect, T^T: Type of Technique, H^S: Hypercube or Search Space Size, N^I: Number of Iterations, T^T: Training Time, A^T: Average Time Per Hyper-parameter Evaluation, GS Case 1: Considering the average time of RS, GS Case 2: Considering the average time of DGS.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sheikh, Z.A.; Singh, Y.; Tanwar, S.; Sharma, R.; Turcanu, F.-E.; Raboaca, M.S. EISM-CPS: An Enhanced Intelligent Security Methodology for Cyber-Physical Systems through Hyper-Parameter Optimization. Mathematics 2023, 11, 189. https://doi.org/10.3390/math11010189

AMA Style

Sheikh ZA, Singh Y, Tanwar S, Sharma R, Turcanu F-E, Raboaca MS. EISM-CPS: An Enhanced Intelligent Security Methodology for Cyber-Physical Systems through Hyper-Parameter Optimization. Mathematics. 2023; 11(1):189. https://doi.org/10.3390/math11010189

Chicago/Turabian Style

Sheikh, Zakir Ahmad, Yashwant Singh, Sudeep Tanwar, Ravi Sharma, Florin-Emilian Turcanu, and Maria Simona Raboaca. 2023. "EISM-CPS: An Enhanced Intelligent Security Methodology for Cyber-Physical Systems through Hyper-Parameter Optimization" Mathematics 11, no. 1: 189. https://doi.org/10.3390/math11010189

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

EISM-CPS: An Enhanced Intelligent Security Methodology for Cyber-Physical Systems through Hyper-Parameter Optimization

Abstract

1. Introduction

1.1. Hyper-Parameter Optimization (HPO)

1.2. Contributions

1.3. Novelty

1.4. Related Works

2. Problem Formulation and Methodology

2.1. HPO Problem Formulation

2.2. Proposed Methodology

3. Results and Discussion

3.1. Dataset Consideration

3.2. Evaluations

3.2.1. Manual Search

3.2.2. Random Search

3.2.3. Directed Grid Search

3.2.4. Bayesian Optimization

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI