*Article* **A Meta-Model to Predict and Detect Malicious Activities in 6G-Structured Wireless Communication Networks**

**Haider W. Oleiwi 1,\*, Doaa N. Mhawi <sup>2</sup> and Hamed Al-Raweshidy <sup>1</sup>**


**Abstract:** The rapid leap in wireless communication systems incorporated a plethora of new features and challenges that accompany the era of 6G and beyond being investigated and developed. Recently, machine learning techniques were widely deployed in many fields, especially wireless communications. It was used to improve network traffic performance regarding resource management, frequency spectrum optimization, latency, and security. The studies of modern wireless communications and anticipated features of ultra-densified ubiquitous wireless networks exposed a risky vulnerability and showed a necessity for developing a trustworthy intrusion detection system (IDS) with certain efficiency/standards that have not yet been achieved by current systems. IDSs lack acceptable immunity against repetitive, updatable, and intelligent attacks on wireless communication networks, significantly concerning the modern infrastructure of 6G communications, resulting in low accuracies/detection rates and high false-alarm/false-negative rates. For this objective principle, IDS system complexity was reduced by applying a unique meta-machine learning model for anomaly detection networks was developed in this paper. The five main stages of the proposed meta-model are as follows: the accumulated datasets (NSL KDD, UNSW NB15, CIC IDS17, and SCE CIC IDS18) comprise the initial stage. The second stage is preprocessing and feature selection, where preprocessing involves replacing missing values and eliminating duplicate values, leading to dimensionality minimization. The best-affected subset feature from datasets is selected using feature selection (i.e., Chi-Square). The third step is represented by the meta-model. In the training dataset, many classifiers are utilized (i.e., random forest, AdaBoosting, GradientBoost, XGBoost, CATBoost, and LightGBM). All the classifiers undergo the meta-model classifier (i.e., decision tree as the voting technique classifier) to select the best-predicted result. Finally, the classification and evaluation stage involves the experimental results of testing the meta-model on different datasets using binary-class and multi-class forms for classification. The results proved the proposed work's high efficiency and outperformance compared to existing IDSs.

**Keywords:** 6G wireless communications; chi-square; cybersecurity; intrusion detection system; machine learning techniques; meta-model; stacking ensemble learning; voting techniques

### **1. Introduction**

The advancement of modernized wireless communication networks with their accompanying features, technologies, heterogeneously connected networks/gadgets, service demands, and the huge amount of data traffic has brought more complexity and sophistication to communication systems [1]. The 6G revolution and internet of everything (IoE) technology drive artificial intelligence (AI)-based incorporations (e.g., machine learning (ML)) in the ubiquitous connection of billions of sub-networks, users, and devices. Furthermore, the new features of 6G and beyond wireless communications, movable infrastructure, and the potential intelligent services add critical security risks to the network's core, edge, and associated devices [1–4]. Modern networks benefit significantly from AI and ML in various ways, such as intelligent communications, network optimization, and

**Citation:** Oleiwi, H.W.; Mhawi, D.N.; Al-Raweshidy, H. A Meta-Model to Predict and Detect Malicious Activities in 6G-Structured Wireless Communication Networks. *Electronics* **2023**, *12*, 643. https:// doi.org/10.3390/electronics12030643

Academic Editors: Shihao Yan, Guanglin Zhang, Li Sun, Tsz Hon Yuen, YoHan Park, Changhoon Lee and Tao Huang

Received: 3 January 2023 Revised: 20 January 2023 Accepted: 24 January 2023 Published: 28 January 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

big data analytics. However, the threats of renewable intelligent attacks on the networks increase proportionally with the complexity increase (caused by heterogeneity, enormous scale, and variety of applications these networks serve) [5–10]. The difficulty of creating adequate security procedures to defend the network increases due to the possibility of attackers discovering network vulnerabilities utilizing AI techniques. Thus, it is highly necessary to build a robust intelligent intrusion detection system (IDS) to comply with the evolution of intelligent attacks and to secure future networks [11–15]. The new networks connect a variety of billions of users/devices to serve people, providing a plethora of services/applications via the network's main components, e.g., the base station (BS) using the edge of technologies, e.g., terahertz communications, non-orthogonal multiple access, and IoE [12,15,16]. In risk-sensitive systems safety, the realization of a zero-day attack is not an easy process, especially with the proliferation of numerous malicious activities. Figure 1 demonstrates a sample of the 6G general expected infrastructure with a number of nominated applications and media over different areas [17].

**Figure 1.** A sample of 6G expected infrastructure and applications.

IDSs send out notifications when discovering an unexpected activity or identified hazards. Any destructive behavior that interferes with the information system is considered an intrusion [18]. IDSs scan computers for unusual activities a conventional packet filter may fail. IDSs note any indicator for potentially dangerous action of network packets, as well as signals for highly resilient cyber defenses against disruptive activities and nonauthorized access to a computer system. IDSs use two methods to detect intrusions (i.e., misuse and anomaly). A new IDS that includes these two methods was presented to overcome these limitations to increase accuracy and decrease FAR [11,19–25]. Furthermore; feature selection (FS) is a useful approach for IDSs to specify the significant features and cancel the useless features with less performance degradation [26–28]. IDSs require classifier methods to detect the final results and there are different AI methods for this task, e.g., ensemble learning (EL). EL techniques were used as building blocks for more complicated models by integrating many weak learners in EL methods, e.g., Bagging, boosting, AdaBoosting, and stacking (meta-model). These models of classifiers are used to reduce variance when using the bagging method, manipulated high bias to achieve strong classifiers inside these models when using the boosting, and the main session of the stacking (meta-model) is to combine the strengths of several effective models to provide predictions that perform better than any one model in EL [29].

However; IDSs still do not achieve the needed optimization for detection rate (DR), false alarm rate (FAR), or running time because of the high-dimensional dataset and abundant Zero-day attacks. Despite having a direct influence on resources, time complexity was not given as a significant consideration. Besides, the technological realm is envisioning IoE and 6G networks depending on the equipment that is programmed using lightweight algorithms.

This work targets initiating more sufficient/robust ML techniques-based attack-resistant detection to increase the IDSs' stability and accuracy by reducing the amount of computation/time needed by using four different datasets. The proposed model trains the FS method and ML algorithms to realize accurate/efficient IDs. Utilizing AI systems, the orientation of wireless communications must be thought about. Therefore; the contributions of this work are:


The remaining sections of this paper are organized as follows:

Section 2 implies several similar works, while Section 3 provides a detailed definition of the proposed system's methodology and addresses the experimental findings. Furthermore, it illustrates how the proposed method was implemented with the applied datasets and addresses the technical constraints. Finally, the conclusions are stated in Section 4, which summarizes the results, directions for further investigation, and future suggestions.

### **2. Literature Review**

In this section, the authors study the other related similar studies and demonstrate them in Table 1 for better understandable readability. Furthermore, to distinguish each of those related studies the main FS method with the number of FSs, type of the classification method, experimental results, and disadvantages.


**Table 1.** Similar related studies.


### **Table 1.** *Cont.*

To the researchers' knowledge, the provided system outperforms the earlier systems in terms of performance and outcomes. Using numerous datasets, it considerably excels in literature performance and delivers the highest results.

### **3. Methodology**

IDSs observe malicious or suspicious activities in the traffic across the whole communication network. They were presented to wireless communication networks to examine for any abnormal activity occurring throughout control/data communication. The hacker attempts to penetrate networks to stop communications or capture important data. By breaching networks' security and affecting the behaviors of sensors/networks, the attacker inserts bugs into a network. To solve this sensitive issue and protect the system from malicious actors, a properly secured framework is required. The proposal's main structure is shown in Figure 2.

Figure 2 shows different stages to detect suspicious/malicious activities (anomalies) over the communication network undergoing preprocessing. Before these stages, collecting different types of datasets and detecting the missing values are required, replacing the null values with some values, while average values are considered. After that, duplicate values are deleted from datasets (NSL\_KDD, UNSW\_NB15, CICI\_IDS17, and SCE\_CIC\_IDS18).

Next step, data normalization and encoding processes are performed. Encoded data undergoes a dimensionality decrease to aid data handling. Accordingly, features are optimized to attain the optimal features out of the entire data. This is helpful to detect anomalies within data. After preprocessing, the cleansed data will transfer to the next level to utilize impacted features only to the finalized results by applying Chi-square. Ultimately, the proposed system uses meta-ML models as a classifier to detect and predict malicious activities in the network traffic. It includes a number of stages that include several steps with a dedicated task each. Each stage's outcome represents an input to its next stage. The stages are described in detail successively.

### *3.1. First Stage: Datasets Collection*

The researchers' main problem is finding an appropriate dataset for evaluating IDSs. Therefore; there are different collected datasets used with different features (NSL\_KDD, UNSW\_NB15, CIC\_IDS17, and SCE\_CIC\_IDS18). They were collected from different sites and contained different types of attacks. These datasets are used for experiments, and each dataset is briefly described as follows:

**Figure 2.** The proposed system's general structure.

### 3.1.1. First: NSL\_KDD Dataset

NSL-KDD is a dataset suggested to solve some of the inherent problems of the KDD'99 dataset. Because of the scarcity of freely available datasets for networking-built IDSs, the new dataset's version is still in service as a high-impact benchmark dataset to help the researchers in comparison of multiple ID strategies, although they have technical issues noted by McHugh. NSL-KDD training set and testing set have a notable quantity of records. The achieved gain enables cost-effective experimentation on the entire set without arbitrary selection of a limited subset.

### 3.1.2. Second: UNSW\_NB15 Dataset

It is a network intrusion dataset that is collected by the university of the new southern western network base in 2015. It contains nine types of attacks. Raw network packets are included in the dataset. There are 175,341 records in the train set and 82,332 records from various types of activities in the test set (attacks and normal activities).

### 3.1.3. Third: CIC\_IDS17 Dataset

The CIC\_IDS17 dataset (compiled in 2017) was released by the Canadian Institute for Cybersecurity (CIC). It offers positive information and the most current widespread attacks. The outcomes of the network traffic analysis using the CIC flow meter are also presented. Time-stamped flows exist for protocols, source/destination IPs, ports, and attacks. One of the most recent datasets is this one. Updated DDoS, Brute Force, XSS, SQL Injection, Infiltration, Port Scan, and Botnet assaults are among the things it contains. There are 2,830,743 records total in this dataset, which is divided into eight files. Each record comes with 78 unique characteristics and labels. In order to maintain the same magnitude order for each dataset when multi-classification is required.

### 3.1.4. Fourth: SCE\_CIC\_IDS18 Dataset

The University of New Brunswick created this dataset for analyzing DDoS data. It was sourced completely from 2018 and stopped updates. The dataset was built depending on the university's servers' logs, which have observed a variety of DoS attacks during the free availability era. When writing the dataset, ML notebooks observed that the label column is the precious portion, as it determines if the transmitted packets are malicious or benign. Data is divided into various files based on date. Each file is unbalanced, and it is up to the notebook creator to divide the dataset into a balanced form for higher-quality predictions. It has eighty columns, each of which corresponds to an entry in the IDS logging system

the University of New Brunswick has. Given the system divides traffic into forward and backward. The most important columns within this dataset (i.e., Destination port, Protocol, Flow Duration, total forward packets (Tot Fwd Pkts), total backward packets (Tot Bwd Pkts), and label (Label).

### *3.2. Second stage: Preprocessing and FS*

The datasets collected in the first stage undergo preprocessing and FS steps. The processing of these steps is demonstrated in Algorithm 1.

**Algorithm 1.** Preprocessing and FS.


In Algorithm 1, raw data in each dataset is passed into two main steps. Firstly, preprocessing to clean and prepared data (filtration process) then non-numerical values are converted into numerical using the one-hot encoding (transformation process) and then converted into the binary form using the Minimax scaling function (normalization). The outcome of this algorithm is to return the best subset features of each dataset. Therefore; the best subset features are (20, 30, 35, and 38) for NSL\_KDD, UNSW\_NB15, CIC\_IDS17, and SCE\_CIC\_IDS18 datasets, respectively.

### *3.3. Third and Fourth Stages: ML Techniques for NIDS (Training Set) and Voting Techniques (Meta-Model) for the Testing Set*

For the training stage, many different classifiers are used (i.e., XGBoosting, random forest (RF), AdaBoosting, GradientBoosting, LightGBM, and CatBoost) each of them considered as a base classifier. Each of these classifiers manipulates the training data independently by taking the Di of each dataset. Afterby, the results of each base classifier (predictions) are aggregated into the meta-model (DT), Figure 3 demonstrates the main idea of meta-model classifiers. Furthermore, the testing stage begins in the meta-model to get the prediction results to check the evaluation and performance of the proposed meta-model. Algorithm 2 illustrates this stage.

The meta-model working mechanism are demonstrated in detail in the following subsections.

### 3.3.1. The Datasets Partitioning Mechanism

It is necessary to aggregate the result of each classifier through the composite model and then send them to the stacking model to select the best result for voting. Furthermore, the voting technique is a type of EL methods that combines the predictions of several different models (classifiers) and selects the best prediction with the most votes.

As shown in Figure 3, the meta-model system has four traffic datasets, it uses three datasets as source datasets to train the meta-model, whereas the fourth dataset is used as a target to fine-tune it and then test the model performance. Each source dataset requires splitting into training and validation partitions. During training, it randomly selects two batches of samples from the training datasets, using one batch to compute the task-specific parameters and the other batch to compute the loss. Then repeat the same process with the validation dataset to be able to select the best prediction model. After the training, it is essential to fine-tune the model upon the target dataset.

### 3.3.2. Classifiers Work and Aggregation Techniques

In Algorithm 2 there are different classifiers, each of which performs a specific process and manipulates problems precisely. RF is a meta-estimator that fits several DT classifiers on different datasets' sub-samples, applying averaging to enhance predictive accuracy and controlling overfitting. Subsample and original input sample sizes are usually the same, however, samples are drawn with replacement if bootstrap = True. While XGBoost optimized gradient boosted DT. This classifier does not need normalized features and works well if the data is nonlinear, non-monotonic, or with segregated clusters. Whereas the AdaBoosting classifier is to fit a sequence of weak-learners (e.g., models that are better than stochastic guessing, like small DTs) on repetitively modifying data versions. Consequently, the predictions get integrated by a weighted majority vote (or sum) to generate the final prediction. Data modifications at each so-called boosting iteration include applying weights ω1, ω2, ω3,... ... , ω*<sup>N</sup>* to every training sample.

```
Algorithm 2. ML and meta-model techniques.
Input: Xi for each dataset [i] from Algorithm 1;
     K /* is the number of classifiers*/;
     Learning_Rate (LR);
     Random_state (RS);
     Mi; /* Error rate of each classifier*/; (i.e., (Mi) = ∑d
                                                         j=1 wj × err(Xj));
     Number of Estimators (NS); /* subset number*/;
     Criterion; /* type of measure*/;
     Machin learning classifiers (Bse classifiers); (i.e.,
     RandomForest (C1),
                                   XGBoosting (C2),
                              AdaBoost (C3),
                              GradientBoosting (C4),
                              LightGBM (C5),
                              and CatBoosting (C6)).
Meta-model classifier (i.e., DT (C8))
Output: A composite model.
Begin
1. ML techniques (base-classifiers):
Read a number of K.
Loop: from 1 to k
RandomForest (C1) Determine attribute:
        RS = 1, NS = 10, LR = 0.01, max_features [integer] /*The number of features to consider
        when looking for the best split*/.
XGBoosting classifiers (C2) Determine attribute:
              Determine attribute: LR = 0.01, RS = 1.
AdaBoosting classifiers (C3) Determine attribute:
              Determine attribute: RS = 1, and NS = 10, wi = 1/N.
GradientBoosting (C4) Determine attribute: (Loss = 'deviance', LR = 0.1, number of
estimators = 100, minimum split samples = 2, maximum depth = 3, fraction of validation = 0.1).
LightGBM (C5) Determine attribute:
              RS = 1, and NS = 10.
CatBoosting (C6) Determine attribute:
              RS = 1, and NS = 10.
Repeat
For i = 1 to 6 do
Mi for the prediction by applying:
                  Mi= ∑d
                         j=1 wj × err(Xj).
                  If Mi is larger than half then
                  [log (1 − (Mi))/(Mi)].
                  End if
Until the results of 6 Ci
End for
Return all Ci with minimum Mi.
2. Meta-model (DT) and compute (Voting techniques):
Repeat
Compute average weighting techniques for all Ci by 1
                                                      mj = l∑l
                                                              i=1 pci( wi
                                                                      x ).
Measurements of the binary and multi-class forms:
                         DR, FNR, FPR, TPR, TNR, accuracy, FAR, precision, and recall.
Until the result is the best.
Return composite-model.
End
```
Since all weights are initially set to ωi = 1/N, the initial step trains a learning algorithm using initial data. The sample weights are individually adjusted for each further iteration, and the learning process is then performed once more on the reweighted data. Furthermore; to compute and adjust weight, it undergoes the following steps:


The consequences of training examples at a particular stage are changed to reflect whether or not the boosted model that was induced in the preceding step accurately predicted those training examples. Examples that are challenging to foresee get growing importance during the iterative process. As a result, each weak learner after them in the chain is compelled to focus on the instances that they missed before. Using gradientboosting tree strategies has numerous benefits, which include:


LightGBM is a fast-distributed high-performance gradient-boosting framework based on DT algorithms, it is used for ranking, classification, and many other ML tasks. The CatBoost classifier is an algorithm for gradient boosting on DTs. It is used for search, recommendation systems, personal assistants, self-driving cars, weather prediction, and many other tasks in different companies.

### *3.4. Fifth Stage: Implementation and Evaluation*

### 3.4.1. Implementation

It is carried out by applying four datasets (NSL\_KDD, UNSW\_NB15, CIC\_IDS17, and SCI\_CIC\_IDS18). The train portion is 70% while the test portion is 30% to evaluate the proposal.

System Performance is evaluated by implementing the proposal using four various features selected using chi-square. The intrusion is detected by using different ML techniques with multiclass and binary-class forms of confusion matrices. Ultimately, performance evaluation is done by using multiple measurements; recall, precision, DR, FAR, and FNR. It is carried out by anaconda python 3.9 software and colab platform with Sklearn, Kearse, and Tensor Flow libraries with laptop hardware with the: CPU Core i7, generation 10th, and 11 windows operating system with 64-bit.

### 3.4.2. Evaluation and Experimental Results

1 Binary-Class and Multi-Class Confusion-Matrix forms

The experiment is conducted at this stage of the ML and meta-model (voting techniques) using four different datasets. Confusion-matrix is adopted in each class, which includes benign and attack network traffic. Furthermore, four Features are applied to detect suspicious activities on the network traffic. The proposed system uses binary and multi-class forms confusion matrices.

The distribution of the four states of true-positive (TP), false-positive (FP), truenegative (TN), and false-negative (FN) with different numbers of FSs and computing accuracy and FNR are explained in Table 2.

Table 2 explains the best features and results of accuracy and FNR (i.e., false negative detections are classified into FN and TP detections in the experiment) when using NSL\_KDD, UNSW\_NB15, CIC\_IDS17, and SCE\_CIC\_IDS18 are (20, 30, 35, and 38), respectively. This measurement is significant to measure the efficiency and professionalism of the

proposal due to calculating the total number of errors found in every attack diagnosed as normal. additionally, applying other features leads to an insufficiency of FNR and accuracy measures.

**Table 2.** Accuracy and FNR for (NSL\_KDD, UNSW\_NB15, CIC\_IDS17, and SCE\_CIC\_IDS18) datasets when applied to different FSs.


The core objective of utilizing different datasets is to train the proposed system for different types of attacks and make it more robust against suspicious traffic activities. Figures 4 and 5 demonstrate the final results of the binary form and multiclass form of the confusion matrix.

**Figure 4.** Binary-class confusion matrix.

Figure 4 shows that the proposed system achieves the best prediction results, it distinguishes benign activities and attacks precisely, and it can be noticed that only one percent of the benign activities is predicted as an attack; this result does not affect the final results.

In Figure 5, irrespective of the individual class's accuracy, the accuracy of the entire system (i.e., 99%) depends on the average accuracy of all the classes.

Furthermore; Figures 6 and 7 demonstrate the training and testing confusion matrix with the final measurements' results.

**Figure 6.** Train and Test confusion matrix.

### 2 BIG O Notation Measures

The complexity time of this proposed system is measured by applying the Big O notation (i.e., O (N2)). It contains the calculations of complexity time. However, Figure 8 illustrates datasets classes with the required running time. Noticed the running time is increasing proportionally with input increase.

Figure 8 explains system complexity with respect to the applied datasets. The proposed meta-model reduces the number of features by selecting only the affected and sufficient features. In addition, in the training phase, the meta-model system selects the results of the best-predicted classifiers to be used in the testing phase.

**Figure 7.** Final measurement matrix when applying meta-model system.

**Figure 8.** Big O notation idea for four datasets.

### 3 Analysis Results and Comparison with Other Related Studies

The first stage is very important to clear the datasets and process them from all problems, then pass to the FS stage (chi-square). In this stage, each dataset's class passes through an analysis procedure to check and choose the best effective features' subset to the final results and find the suitable subset feature of NSL\_KDD is 20 features, 30 features of the UNSW\_NB15, 35-features in CIC\_IDS17, and 38-features in SCE\_CIC\_IDS18. Afterby, the ML and voting techniques stages begin to make each classifier work independently and aggregated applying the voting average technique to return the best result for the classifiers.

The proposal is assessed and compared to other previous systems by accuracy, FAR, DR, and a number of FS, Table 3 demonstrates the outperform of the meta-model is 99% for training and 90.1% for testing, as compared with other similar studies.

4 Challenges

Experimental results indicate that IDS based on a new NIDS is proposed using a meta-model (ML) with DT as a voting technique. The main objective is to build a secure system which able to distinguish malicious/suspicious traffic activities. The proposed meta-model proves sufficiency and effectiveness to detect intrusions and suspicious traffic activities, however, some limitations have come into view to be recommended to other researchers. It includes the following constraints:



**Table 3.** Results comparison with other studies.

### **4. Conclusions**

In nutshell, it was discovered that the existing IDSs are still ineffectual despite having intentionally utilized a range of ML techniques to increase their performance, principally as a result susceptibility of to the anticipated 6G wireless paradigm and the rapidly evolving sophisticated threats. The meta-model system initiated a new IDS mechanism to apply to unbalanced/high dimensional network traffic having a low DR given the needed ML classifiers and voting mechanisms. The proposed meta-model system complexity was reduced while applying Chi-Square to present (20, 30, 35, and 38) features for NSL KDD, UNSW NB15, CIC IDS17, and SCI CIC IDS18, respectively to acquire the ideal subset of the best FS and dimensionality reduction. For each dataset, the experiment's results of the meta-model achieve high accuracies for all datasets reach 0.99% and low FAR values for NSL KDD, UNSW NB15, CIC IDS17, and SCI CIC IDS18 were 0.002, 0.004, 0.0013, and 0.0021, respectively. Other findings are concisely displayed within the results comparison table. The suggested method also outperformed current classification methods. As can be observed, this method significantly increased the IDS market's competitive edge over other strategies. Despite the system's benefits, further work is still required to make it capable of handling potential threats from future infrequent traffic.

**Author Contributions:** Conceptualization, H.W.O. and D.N.M.; methodology, D.N.M. and H.W.O.; software, D.N.M.; validation, H.W.O., D.N.M. and H.A.-R.; formal analysis, D.N.M. and H.W.O.; resources, D.N.M.; data curation, D.N.M.; writing—original draft preparation, H.W.O. and D.N.M.; writing—review and editing, H.W.O.; visualization, H.W.O. and D.N.M.; supervision, H.A.-R.; project administration, H.W.O. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Data Availability Statement:** NSL\_KDD, and UNSW\_NB15 Dataset free downloaded from the link: http://www.di.uniba.it/~andresini/datasets.html, accessed on 18 February 2022. CICIDS2017 Dataset free downloaded from the link: http://205.174.165.80/CICDataset/CIC-IDS-2017/Dataset/, accessed on 24 June 2022, and SCE\_CIC\_IDS18Dataset free downloaded from the link: https://www. unb.ca/cic/datasets/ids-2018.html, accessed on 12 January 2022.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

### *Article* **Generalized Code-Abiding Countermeasure**

**Pierre-Antoine Tissot \*, Lilian Bossuet and Vincent Grosso**

CNRS Laboratoire Hubert Curien UMR 5516, 42000 Saint-Etienne, France

**\*** Correspondence: pierre.antoine.tissot@univ-st-etienne.fr

**Abstract:** The widely used countermeasures against fault attacks are based on spatial, temporal, or information redundancy. This type of solution is very efficient, but it can be very expensive in terms of implementation cost. Thus, trying to propose a secure and efficient countermeasure for a lightweight cipher is a hard challenge, as the goal of a lightweight cipher is to be the lightest possible. This paper considers information redundancy based on parity bit code, with code-abiding transformations of the operations. This error detection code, with the code-abiding notion added, is very efficient against single fault injection and has a small overcost. The solution is tested on the LED lightweight cipher to measure its overhead. Moreover, a bitslice version of the cipher is used with the parity bit code applied to be robust against all the single-word fault injections. The challenge is to adapt the cipher functions in a way in which the parity bit is always considered, but without considering a heavy implementation. The advantage of our solution is that this countermeasure leads to a 100% fault coverage, with a reasonable overhead.

**Keywords:** fault attack; error detection; code abiding; overcost; bitslice cipher

### **1. Introduction**

Cryptographic implementations are prone to physical attacks. Physical attacks take advantage of physical properties of a device while running a cryptographic algorithm to break the security. Most popular physical attacks are fault attacks [1] (taking advantage of the circuit's tend to perturbations) and side-channel attacks [2] (taking advantage of the circuit's leakage). This work focuses on fault attacks. The principle of fault attacks is to use means, such as laser injection or clock glitching, in order to inject faults during an encryption and to extract information by analyzing the circuit's behavior after the injection.

To counter fault attacks, various countermeasures have been developed, using mainly redundancy [3–7]. Redundancy allows one to create multiple information sources, and these multiple sources are compared at the end of the computation to detect fault injection. Redundancy can be applied at three different levels: temporal, spacial, and informational. Temporal redundancy is based on the multiple encryptions of a plaintext by the same physical cipher (same circuit) and on the comparison between the resulting ciphertexts. Spacial redundancy is based on the multiple encryptions of a plaintext by different physical ciphers (different circuits). Moreover, additional information can be added to the plaintext to create an information redundancy. This information is data-dependant and is used to detect if a fault is present. In all the cases, the potential leakages of the cipher are more numerous, and the side-channel attacks (SCA) are thus more efficient [8]. Therefore, when designing a countermeasure against fault attacks, the designer should then take into account vulnerabilities that the countermeasure considered able with regard to including for the side channel adversary. The objective in that case is to make the overcost of the countermeasure as small as possible, especially when the countermeasure is implemented on a lightweight cipher.

**Citation:** Tissot, P.-A.; Bossuet, L.; Grosso, V. Generalized Code-Abiding Countermeasure. *Electronics* **2023**, *12*, 976. https://doi.org/10.3390/ electronics12040976

Academic Editors: Tao Huang, Shihao Yan, Guanglin Zhang, Li Sun, Tsz Hon Yuen, YoHan Park and Changhoon Lee

Received: 27 January 2023 Revised: 13 February 2023 Accepted: 14 February 2023 Published: 15 February 2023

**Copyright:** © 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

### *1.1. Related Work*

Simon et al. [9] presented a solution of error detection that hardly increased the SCA vulnerability. However, this solution is restrictive for any designer that would prefer to apply code-abiding to an existing cipher, and more particularly, work-oriented block ciphers. Our goal is then to generalize the code abiding method to any existing or new word-oriented block cipher.

Bertoni et al. [10] used the parity bit code to detect the fault injected on AES. Bertoni et al. described a modification of the algorithm with the addition of the parity bit matrix, when our objective is to use a bitslice version of an algorithm to add a bitwise countermeasure to a word-oriented cipher. Then, the S-Box used in [10] with half of its entries set to 00..001 is efficient in a software way, but in a hardware implementation, a big number of logical gates would be used. Moreover, the protected cipher would be robust against 1-bit fault injection, but our solution would prevent any 1-word fault injection.

Lac et al. [11] used an internal redundancy countermeasure: every data block is duplicated *k* times and surrounded by *n* reference blocks. The *k* copies allow us to detect up to *k* fault injections by comparing the results. Moreover, the reference blocks would detect a fault, even if it affects all the copies of the data block. Indeed, reference blocks are known pairs of plaintext/ciphertext, and a check is done of the cipher reference blocks to detect an injected fault. The blocks are randomly distributed in the register. Thus, for each data block, we have *k* + *n* blocks of overhead. In our paper, the solution adds 1 bit for each data block. The overhead is then much lighter in our solution.

### *1.2. Contributions*

Our first contribution is an exhibition of a fault injection realized in precise conditions during a computation of a Friet operation [9] that results into an undetected error. The conditions are presented, with two countermeasures that can be applied to allow the detection of the error.

The second contribution is the application of the code-abiding method to an existing cipher with an example on the lightweight LED cipher [12]. The countermeasure is designed to obtain the smallest overcost possible. The secured solution presented in our work should be 25% more expensive than the original LED implementation, as only one parity bit is added for each nibble. Our work thus focuses on the cost optimization of the countermeasure, in terms of the number of gates and memory space needed, as well as power consumption.

This work should allow implementations to be robust against a single injection fault with an optimization of the overcost brought by the countermeasure.

### **2. Background**

In this section, we briefly introduce notions on coding theory that are useful for the countermeasure presented. We also recall the operation of LED block cipher [12] on which we apply our countermeasure as a proof of efficiency of our method.

### *2.1. Error Detection*

The solutions presented in this paper use the code-abiding concept introduced in [9]. This solution is based on computation over data encoded with error detection. In the following, we give the goal and the principle of error detection, which is a set of techniques that makes it possible to detect errors during the transmission of information.

**Definition 1** (Error detection code)**.** *Let <sup>E</sup> be a set and* C ⊂ *E. We denote* <sup>C</sup>¯ <sup>=</sup> *<sup>E</sup>*\{C}*.* <sup>C</sup> *is an error detection code if and only if:*


*In this case,* + *is the addition operator, according to the set E.*

An error detection code allows one to divide a set into two different subsets with a minimal Hamming distance between a word of a subset and a word from the other.

**Definition 2** (Parity bit)**.** *Let x be a n-bit word. We denote xi as the i-th bit of x, then we have x* = *xn*−1||*xn*−2||...||*x*1||*x*0*, where* || *is the concatenation operator. The parity bit xp is the sum of all the xi (using XOR operator): xp* = *xn*−<sup>1</sup> ⊕ *xn*−<sup>2</sup> ⊕ ... ⊕ *x*<sup>1</sup> ⊕ *x*0*. We use the even parity in our case, so the XOR of all the bits (including the parity bit) is 0. Its purpose is to detect an odd number of fault in the output.*

**Example 1** (Parity bit)**.** *Let x* = 011010*. The parity bit xp* = 0 ⊕ 1 ⊕ 1 ⊕ 0 ⊕ 1 ⊕ 0 = 1*.*

The parity bit method is the error detection scheme that is used during this work. The two subsets are composed by the words with an even parity for the first one and the words with an odd parity for the second one.

**Definition 3** (Check function)**.** *The* CheckFunction *applied to a word verifies its parity characteristic. The function returns a Boolean with* TRUE *when an even parity is verified and* FALSE *when odd parity is verified.*

**Example 2** (Check function)**.**


### *2.2. Code Abiding*

We now want to implement the error detection scheme into an encryption algorithm. Then, we need to use functions that keep the parity characteristic of the words. With this intention, we use the code abiding notion. Code abiding was introduced in [9]. In this work, the idea was to build permutation over the space *E*. In order to detect fault, the permutation built must respect separation of the space. In other words, the permutation over *<sup>E</sup>* can be seen as two permutations, one over <sup>C</sup>, the other over <sup>C</sup>¯. The separation between the two spaces should allows detection of every single fault injection.

**Definition 4** (Code abiding function)**.** *f is a* C *code abiding function if and only if:*


The algorithm has to be composed by code abiding functions to keep the parity property of the words and to propagate the error injected until the detection of the fault.

### *2.3. LED Cipher*

The LED Cipher, presented in [12], is a lightweight block cipher. Its purpose is to offer a very small silicon footprint in comparison with other block ciphers, as well as to be secure against related-key attacks by using AES-like security proofs.

This cipher is a 64-bit block cipher using mostly 64-bit keys and 128-bit keys. However, any length between 64 bits and 128 bits can be used if the length is divisible by four. In this sense, 80-bit keys are also often used. In our work, we focus on the 64-bit key length. However, the results presented are valid for any key length and are not limited to the LED cipher. Indeed, the code abiding solution can be added on any block cipher.

A 64-bit state *St* is conceptually divided into sixteen 4-bit nibbles (*St* = *st*0||*st*1|| ... ||*st*15) and arranged in a square array, as described in Matrix *state*.

$$state = \begin{pmatrix} st\_0 & st\_1 & st\_2 & st\_3 \\ st\_4 & st\_5 & st\_6 & st\_7 \\ st\_8 & st\_9 & st\_{10} & st\_{11} \\ st\_{12} & st\_{13} & st\_{14} & st\_{15} \end{pmatrix}.$$

Using the same process, the key *K* is divided into subkeys *ki* (Matrix *K*).

$$K = \begin{pmatrix} k\_0 & k\_1 & k\_2 & k\_3 \\ k\_4 & k\_5 & k\_6 & k\_7 \\ k\_8 & k\_9 & k\_{10} & k\_{11} \\ k\_{12} & k\_{13} & k\_{14} & k\_{15} \end{pmatrix}.$$

The cipher process is the combination of two operations: **AddRoundKey** and **step** (see Figure 1). The **step** operation is computed *s* times while the **AddRoundKey** is computed *s* + 1 times. This value depends on the key length: *s* = 8 for a 64-bit key and *s* = 12 for a 128-bit key.

**Figure 1.** Representation of the LED encryption.

The **step** operation is composed by four rounds themselves composed by four operations, AddConstants, SubCells, ShiftRows, and MixColumnsSerial (See Figure 2), while the **AddRoundKey** operation is the combination of the state and the subkeys using XOR.

AddConstants. Six bits, *rc*5,*rc*4,*rc*3,*rc*2,*rc*1, and *rc*<sup>0</sup> (initialized to zero), are shifted to the left (*rc*<sup>5</sup> = *rc*4; ... ;*rc*<sup>1</sup> = *rc*0) and *rc*<sup>0</sup> = *rc*<sup>5</sup> ⊕ *rc*<sup>4</sup> ⊕ 1. Those computations are done each round before using the constant. Moreover, the key size (written in its bit form *ks*7||*ks*6||*ks*5||*ks*4||*ks*3||*ks*2||*ks*1||*ks*0) is used to create the constant. Then, the values are combined into a round constant (see Matrix *constant*), and this constant is added (using bitwise exclusive or) to the state.

$$
\begin{split}
\textit{constant} &= \begin{pmatrix}
0 \oplus (ks\_7 || ks\_6 || ks\_5 || ks\_4) & (rc\_5 || rc\_4 || rc\_3) & 0 & 0 \\
1 \oplus (ks\_7 || ks\_6 || ks\_5 || ks\_4) & (rc\_5 || rc\_4 || rc\_3) & 0 & 0 \\
2 \oplus (ks\_3 || ks\_2 || ks\_1 || ks\_0) & (rc\_5 || rc\_4 || rc\_3) & 0 & 0 \\
3 \oplus (ks\_3 || ks\_2 || ks\_1 || ks\_0) & (rc\_5 || rc\_4 || rc\_3) & 0 & 0
\end{pmatrix}
\end{split}
$$

SubCells. The actual state is substituted by the new state using the PRESENT S-box presented in Table 1. This function adds some confusion and non-linearity during the process.

**Table 1.** PRESENT S-box.


ShiftRows. The rows of the the state are rotated: row *i* is rotated *i* positions.

MixColumnsSerial. The state array is post-multiplied by the matrix *M* (see Matrix *M*). For the sake of efficiency, we use the matrix *A* (see Matrix *A*) with *A*<sup>4</sup> = *M* and post-multiply it four times to the state.


After the *s* **step** and *s* + 1 **AddRoundKey**, the state becomes the ciphertext.

Another approach of the cipher is its bitslice version [13], and this approach brings some important properties for this work. Let us suppose that the machine used has 64-bit length registers. Then, the 64-bit state is stored in a single register. The bitslice transformation of the cipher stores the state in 64 registers, each containing 1 bit of data. This approach allows us to have a bit-oriented cipher, rather than a word-oriented one. This is very important for the implementation of the parity scheme.

Another advantage of the bitslice version is the parallel encryptions. In the same conditions as the previous point, instead of using 64 registers containing only 1 bit of useful data, we can encrypt *n* plaintexts in parallel and then store 64 *n*-bit useful data. As the bitslice version is bitwise, the cipher cannot interfere between the different states. It is this method that induces detection of any 1-word fault injection. Indeed, as every machine word is seen as the concatenation of a single bit of *n* states and as any 1-bit injection in a state would be detected, then up to 1-word fault injection could be detected here.

The state transformation is presented in Figure 3 within a blue register of the machine and within a red state of the cipher.

**Figure 3.** Bitslice transformation of the state

### **3. Error Compensation Issue**

In this section, we exhibit a potential fault attack against Friet [9]. Indeed, in particular scenarios, we show that a fault injected can create several errors that can be compensated during the parity check, so the error is not detected. The scenario has a small probability of success, depending on the attack model and some requirements about implementation characteristics. We then present a countermeasure to prevent this kind of attack in a strong model where the adversary can inject the fault of his choice at the position and time one chooses.

### *3.1. Issue Example*

We bring out the vulnerabilities with an example, and we next generalize it. We assume that the attacker can add a fault to one value during the computation.

Let <sup>=</sup> 2128 <sup>−</sup> 1 as Friet manipulates 128-bit data. For the sake of simplicity, we call *a*, *b*, *c*, and *d* the inputs of the *μ*<sup>2</sup> operation. For the same reason, *a* , *b* , *c* , and *d* are the outputs of the operation. During these operations, the parity equation followed is *d* = *a* ⊕ *b* ⊕ *c*. This parity equation is checked to ensure that no fault is injected. We inject the additive fault into the word *c* during the *μ*<sup>2</sup> operation, and, after that, the rotated word *c* is added to *a* and to *b* before the addition. The fault injection is illustrated in Figure 4, with the red line showing the modification.

**Figure 4.** Round of FRIET-P.

When such a fault is injected, the outputs of two branches are modified: the second *b* and the third *c*. We denote *b* and *c* as the second and third words of the output of the faulty *μ*<sup>2</sup> operation. Then, we have two equations:

*b* = *b* ⊕ ((*c* ⊕ 0xFF..FF) ≪ 80) = *b* ⊕ (*c* ≪ 80) ⊕ (0xFF..FF ≪ 80) = *b* ⊕ (*c* ≪ 80) ⊕ 0xFF..FF *c* = *c* ⊕ 0xFF..FF.

> At the output of the faulty *μ*2, we have (*a* , *b* , *c* , *d* )=(*a*, *b* ⊕ (*c* ≪ 80) ⊕ 0xFF..FF, *c* ⊕ 0xFF..FF, *d*). Then, the check subroutine does not detect the injected fault, since the fault on the two branches cancel out when applying a XOR operation on the values.

> In the previous example, we saw that the fault 0xFF..FF is not detected because it remains unchanged with the shift by 80 bits (that is the shift of *μ*2). However, this is not the only fault that is not detected with this shift. Indeed, all the faults that remain unchanged with the 80-bit shift have this property. With the Algorithm 1 we can identify all the valid fault that are undetected. Indeed, we begin with the word *i* = 1, and we shift this word by 80 bits, and we test when the word come back to the initial value 1. The cycle length found is 8, and the number of cycles is then *word*\_*length cycle*\_*length* <sup>=</sup> <sup>128</sup> <sup>8</sup> = 16. That means that every fault composed by 8 same 16-bit concatenated words is not detected. Thus, we have 216 <sup>−</sup> 1 undetected faults (the value 0x00..<sup>00</sup> is not a fault) over 2128 different faults. In terms of probability, we have a probability around 2−<sup>112</sup> to have an undetected fault. As this probability is very tiny, then with some random faults it is difficult to identify such a

weakness. Moreover, the undetected fault is a 128-bit injection, and when the registers are strictly smaller than 128-bit long, then the necessary fault would affect two registers so two faults would be needed. This constraint places the error outside of the study.

**Algorithm 1** Find the length of a cycle.


The same analysis can be done for *μ*1, and it is easy to see that the only undetected fault is the all 1 fault. For the *χ*, the bitwise and between two branches make the fault non-detection probabilistic in function of the data in the second branch.

### *3.2. Countermeasures*

We assume that the registers are wide enough to ensure that the undetected faults are still in the limits of the study. An obvious solution to this issue is to increase the cycle length to limit the number of undetected faults. With a shift of 1 bit, the cycle length is maximum with a value of 128. Indeed, only the words composed by 128 same 1-bit words are undetected. However, the fault 0xFF..FF remains undetected (0x00..00 is still not a fault), and we have to modify the former operation to implement our solution.

Another solution must be found to detect all the faults without changing the cryptographic primitives of the cipher. We copy every variable used more than once and check if the copies are equal. A Boolean flag is used to express the error detection (flag obtains the value 0 when an error is detected). With this principle, any fault injected during an operation only affects one copy and is detected before using the copies. The following Algorithm 2 presents the copies and the checks on the operation *μ*<sup>2</sup> of the FRIET-P round and shows the overcost of this solution in comparison with the classical FRIET-P presented in Algorithm 3.

The overcost of the countermeasure lies on the three copies of the value *c* and the comparison of these three copies. This solution is used in the rest of the paper to avoid undetected fault injection.


**Require:** Four 128-bit words *a*, *b*, *c* and *d* **Ensure:** Four 128-bit words *a* , *b* , *c* and *d* computed by the protected *μ*<sup>2</sup> operation *c*<sup>0</sup> ← *c c*<sup>1</sup> ← *c c*<sup>2</sup> ← *c* flag ← flag & (*c*<sup>0</sup> == *c*1) & (*c*<sup>0</sup> == *c*2) *a* ← *a* ⊕ (*c*<sup>0</sup> ≪ 80) *b* ← *b* ⊕ (*c*<sup>1</sup> ≪ 80) *c* ← *c*<sup>2</sup> *d* ← *d* **return**(*a* , *b* , *c* , *d* )

### **Algorithm 3** Classical *μ*<sup>2</sup> operation of the FRIET-P round.

### **Require:** Four 128-bit words *a*, *b*, *c* and *d*

**Ensure:** Four 128-bit words *a* , *b* , *c* and *d* computed by the original *μ*<sup>2</sup> operation

*a* ← *a* ⊕ (*c* ≪ 80) *b* ← *b* ⊕ (*c* ≪ 80) *c* ← *c*

*d* ← *d* **return**(*a* , *b* , *c* , *d* )

### **4. Code Abiding on LED**

In this section, we present a generic method to apply code-abiding countermeasures to word-oriented block ciphers and illustrate this technique on LED cipher [12]. Wordoriented ciphers are often implemented with tables (S-boxes for the substitution layer and multiplicative tables for the diffusion layer) and with XOR operation on words in the same column. We then need to consider error detection codes at word, column, and state levels.

The basic code is defined at word level and is simply extended to the column and state level. Indeed, the columns and the state are only a concatenation of the words. Thus, if we have a code C of parameters [*n*, *k*, *d*] at word level, the concatenation of *l* word is a code C of parameters [*ln*, *lk*, *d*] at column level.

The principle of code abiding protection is to apply permutation on different codes. Thus, we have two cases if we apply always permutation to the full state, and then, either we are in the code and stay in the code or we are not in the code and stay outside the code. Since we target only one fault injection, we can at most change one set, and, due to our construction, any single fault injection forces us to change from a word of the code to a word outside the code. The last property is obtained thanks to bitslice representation and check of non-modification when we use the same variable in a different place. (Note that we can hope for security, and fault detection, for multiple random fault with high probability, thanks to parallelism we use). We next present, in more details, the adaptation made for each operation.

### *4.1. State Modification*

Let *S* be the 64-bit state of the unprotected LED cipher. In order to detect fault, we need to add a redundant part. In our case, we select the 5-bit parity check code. Thus, we need to add a parity bit for each 4-bit nibble of the state. Indeed, if we denote *Si* the *i th* bit of *<sup>S</sup>*, we have: *<sup>S</sup>*64+*<sup>i</sup>* = *<sup>S</sup>*4×*<sup>i</sup>* ⊕ *<sup>S</sup>*4×*i*+<sup>1</sup> ⊕ *<sup>S</sup>*4×*i*+<sup>2</sup> ⊕ *<sup>S</sup>*4×*i*+3, where *<sup>S</sup>*4×*i*, *<sup>S</sup>*4×*i*+1, *<sup>S</sup>*4×*i*+2, *<sup>S</sup>*4×*i*+<sup>3</sup> are the bits of the nibble *i*. Eventually, we have a 80-bit state composed of 64 data bits and 16 parity bits.

In the Section 2, we presented the bitslice version of LED. This is the version that is used in this work, and we thus have to add the parity property by adding 16 registers. The Figure 5 illustrates this step within blue for a register of the machine, in red the data bits of a state, and in pink the parity bits of the red state. The code abiding notion is more bit-oriented than word-oriented, and then the bitslice approach allows us to use the code abiding notion on a classical word-oriented cipher.

These transformation functions are summarized in the following Algorithms 4 and 5.



**Require:** Bitslice state *S* **Ensure:** State *S* with parity bits **for** *i* in range(16) **do** *S*64+*<sup>i</sup>* ← *S*4×*<sup>i</sup>* ⊕ *S*4×*i*+<sup>1</sup> ⊕ *S*4×*i*+<sup>2</sup> ⊕ *S*4×*i*+<sup>3</sup> **return** *S*

**Figure 5.** State after Bitslice and Parity transformations.

### *4.2. Key and Constant*

We assume that the key and the constant are stored in an encoded manner.

We copy the key and the constant at the beginning of the computation and use the copy for the all computation at the end, and we check that the copy used stayed unchanged. Thus, any charge in the key during the encryption is detected. Since the attacker can only inject one fault, modification of the key is detected. An adversary that modifies the key may inject fault at each key addition. However, by using copy and checking at the end and thanks to the absence of key schedule in LED-64, the attacker cannot use this method for multiple fault injection.

The only method should be to modify the stored key. However, LED is known for resistance against related key attacks and, thus, no exploitable information can be obtained by the attacker.

If a fault is injected on the key, the XOR operation with the state propagates the error on the state until the parity check of the state.

### *4.3. AddConstant*

We calculate the constant presented in Section 2. This constant is a 64-bit value that we transform into 80 *n*-bits values (bitslice + parity transformations) that are computed to the state using XOR operation. If *n* encryptions are performed in parallel, the constant must fit the *n*-bit length of the registers, and then the 80 bits have to be duplicated *n* times. This is illustrated in Algorithm 6 (0xFF..FF is composed by *<sup>n</sup>* <sup>4</sup> F).

### **Algorithm 6** Constant *c* bitsliced and duplicated.

**Require:** 64-bit constant *c* **Ensure:** 80 *n*-bit (with duplication of the bit) constants *ci* with parity and bitslice transformations. **for** *i* in range(64) **do** *ci* ← ((*c* ≫ (63 − *i*)) & 1) × 0xFF..FF **return** *ci*

If a fault is injected on the constant, the error is propagated on the state until the check parity function.

### *4.4. ShiftRows*

This function is the same operation as the former ShiftRows operation. The parity bits are shifted among the nibble from which they have been computed. The bit *S*64+*<sup>i</sup>* is shifted *<sup>i</sup>* <sup>4</sup> bits to the left. This is illustrated in Figure 6.

**Figure 6.** ShiftRows on the bitslice parity state.

During this operation, a fault can be injected on the state and stays on it until its detection. Moreover, as the state is only shifted, its value remains the same, then a fault injected before the operation is propagated on the output.

### *4.5. SubCells*

The substitute operation brings confusion and non-linearity to encryption. It is then a critical function of the block cipher construction. The extension from the code C to the code C requires us to represent the 4-bit S-box by a 5-bit S-box, and this projection brings a choice of the 5-bit S-box.

We present, in Section 5, a way to construct the protected S-box. Here, we present the results on the PRESENT S-box represented by the Table 3. However, we use an alternative form of the S-box composed only by logical gates, the algebraic normal form. This form gives five equations, where *xi* is the *i th* bit of the input and *yi* the *i th* bit of the output (*x*0..*x*<sup>3</sup> are the data bits and *x*<sup>4</sup> is the parity bit).

```
y0 = x3x2x1 ⊕ x3x2x0 ⊕ x3x1x0 ⊕ x3 ⊕ x2x1 ⊕ x2 ⊕ x0 ⊕ 1
y1 = x3x2x0 ⊕ x3x2 ⊕ x3x1x0 ⊕ x3x0 ⊕ x2x0 ⊕ x1 ⊕ x0 ⊕ 1
y2 = x3x2x1 ⊕ x3x2x0 ⊕ x3x1x0 ⊕ x2x0 ⊕ x2 ⊕ x1x0 ⊕ x0
y3 = x4x3x2x1x0 ⊕ x4x3x1x0 ⊕ x3x2x1x0 ⊕ x3x1x0 ⊕ x3 ⊕ x2x1 ⊕ x1 ⊕ x0
y4 = x4x3x2x1x0 ⊕ x4x3x1x0 ⊕ x4 ⊕ x3x2x1x0 ⊕ x3x2x0 ⊕ x3x2 ⊕ x3x0 ⊕ x3 ⊕ x2 ⊕ x1x0 ⊕ x1 ⊕ x0
```
This function is presented in Algorithm 7. We can denote the copies of the values used more than once to avoid error compensation presented in Section 3, and as the bit *x*<sup>4</sup> is used only in the last two equations, this bit is copied only twice. Indeed, the output *S*4×*i*+*<sup>m</sup>* only lies on the values *xm*.., and then a fault is injected on a copy only affecting one output, and the parity characteristic allows the error detection.

If a fault is injected before or during the function, and the separation of the codes in the S-box representation keeps the word out of the code C, and the fault is propagated into the space.

### *4.6. MixColumnsSerial*

This operation is composed by the four post-multiplications with the matrix *A* (see Section 2). The state is decomposed into four columns of four 5-bit nibbles each. These nibbles are the 4-bit data and the parity bit associated. In our operation, only a multiplication by two is used (a multiplication by four is just two multiplications by two). The Algorithm 8 show the multiplication by two operation. This operation is a shift of the bits and a XOR with the LSB of the data word on the second bit of the nibble. The computation on the parity bit is thus only a XOR with this LSB.

The Algorithm 9 presents the state divided into columns and the multiplication with the matrix *A*. Same as in the SubCells function, we create a copy of each element used more than once to avoid a future error compensation.

With the same observations than the previous operations, if a fault is injected before or during the MixColumnsSerial operation, this error is propagated through the operation on the state.

All the LED functions are converted into code abiding functions to keep the parity characteristic of the state and to allow the fault injection detection. The next section focuses on the 5-bit representation of a 4-bit S-box.

**Algorithm 7** SubCells function.

**Require:** State *S*, *i*, flag **Ensure:** State *S* after the SubCells operation and the flag detection flag **for** *j* in range(5) **do** *xj*<sup>0</sup> ← *S*4×*i*+<sup>0</sup> *xj*<sup>1</sup> ← *S*4×*i*+<sup>1</sup> *xj*<sup>2</sup> ← *S*4×*i*+<sup>2</sup> *xj*<sup>3</sup> ← *S*4×*i*+<sup>3</sup> **if** *j* > 2 **then** *xj*<sup>4</sup> ← *S*4+*<sup>i</sup>* **for** *j* in range(5) **do** flag ← flag & (*x*0*<sup>j</sup>* == *x*1*j*) & (*x*0*<sup>j</sup>* == *x*2*j*) & (*x*0*<sup>j</sup>* == *x*3*j*) & (*x*0*<sup>j</sup>* == *x*4*j*) *S*4×*i*+<sup>0</sup> ← *x*03*x*02*x*<sup>01</sup> ⊕ *x*03*x*02*x*<sup>00</sup> ⊕ *x*03*x*01*x*<sup>00</sup> ⊕ *x*<sup>03</sup> ⊕ *x*02*x*<sup>01</sup> ⊕ *x*<sup>02</sup> ⊕ *x*<sup>00</sup> ⊕ 1 *S*4×*i*+<sup>1</sup> ← *x*13*x*12*x*<sup>10</sup> ⊕ *x*13*x*<sup>12</sup> ⊕ *x*13*x*11*x*<sup>10</sup> ⊕ *x*13*x*<sup>10</sup> ⊕ *x*12*x*<sup>10</sup> ⊕ *x*<sup>11</sup> ⊕ *x*<sup>10</sup> ⊕ 1 *S*4×*i*+<sup>2</sup> ← *x*23*x*22*x*<sup>21</sup> ⊕ *x*23*x*22*x*<sup>20</sup> ⊕ *x*23*x*21*x*<sup>20</sup> ⊕ *x*22*x*<sup>20</sup> ⊕ *x*<sup>22</sup> ⊕ *x*21*x*<sup>20</sup> ⊕ *x*<sup>20</sup> *S*×*i*+<sup>3</sup> ← *x*34*x*33*x*32*x*31*x*<sup>30</sup> ⊕ *x*34*x*33*x*31*x*<sup>30</sup> ⊕ *x*33*x*32*x*31*x*<sup>30</sup> ⊕ *x*33*x*31*x*<sup>30</sup> ⊕ *x*<sup>33</sup> ⊕ *x*32*x*<sup>31</sup> ⊕ *x*<sup>31</sup> ⊕ *x*<sup>30</sup> *S*64+*<sup>i</sup>* ← *x*44*x*43*x*42*x*41*x*<sup>40</sup> ⊕ *x*44*x*43*x*41*x*<sup>40</sup> ⊕ *x*<sup>44</sup> ⊕ *x*43*x*42*x*41*x*<sup>40</sup> ⊕ *x*43*x*42*x*<sup>40</sup> ⊕ *x*43*x*<sup>42</sup> ⊕ *x*43*x*<sup>40</sup> ⊕ *x*<sup>43</sup> ⊕ *x*<sup>42</sup> ⊕ *x*41*x*<sup>40</sup> ⊕ *x*<sup>41</sup> ⊕ *x*<sup>40</sup>

**Algorithm 8** Multiplication by 2.

```
Require: Nibble nibble
Ensure: Nibble nibble × 2
  function mc2(nibble)
     nib30 ← nibble[3]
     nib31 ← nibble[3]
     nib32 ← nibble[3]
     flag ← flag & (nib30 == nib31) & (nib30 == nib32)
     nibble[0], nibble[1], nibble[2], nibble[3], nibble[4] ← nib30, nibble[0] ⊕ nib31, nibble[1],
```
*nibble*[2], *nibble*[4] ⊕ *nib*<sup>32</sup>

**Algorithm 9** MixColumnsSerial function.

**Require:** Column *col* composed by five 4-bit nibbles **Ensure:** Column *col* composed by five 4-bit nibbles after post-multiply with the matrix M **function** MixSingleColumn(*col*) *nibble*[0] ← [*col*[0], *col*[1], *col*[2], *col*[3], *col*[16]] *nibble*[1] ← [*col*[4], *col*[5], *col*[6], *col*[7], *col*[17]] *nibble*[2] ← [*col*[8], *col*[9], *col*[10], *col*[11], *col*[18]] *nibble*[3] ← [*col*[12], *col*[13], *col*[14], *col*[15], *col*[19]] **for** *i* in range(4) **do** *nibble*[0], *nibble*[1], *nibble*[2], *nibble*[3] ← *nibble*[1], *nibble*[2], *nibble*[3], mc2(mc2(*nibble*[0])) ⊕ *nibble*[1] ⊕ mc2(*nibble*[2]) ⊕ mc2(*nibble*[3])

**Require:** State *RS* **Ensure:** State *RS* after the MixColumnsSerial operation

### **Algorithm 9** *Cont.*



MixSingleColumn(*col*0) MixSingleColumn(*col*1) MixSingleColumn(*col*2) MixSingleColumn(*col*3)

### **5. 5-Bit Representation of a 4-Bit S-Box**

In the protected version of LED, the SubCells function uses a 5-bit representation of the PRESENT S-box. This section presents how to create a 5-bit representation from a 4-bit permutation and which representation is the best in terms of cost optimization.

In the last section, the SubCells function requires a representation on 5 bits of the 4-bit PRESENT. The former 4-bit S-box must remain the same with the parity bit added at the end of the words. Indeed, the 5-bit representation is already half filled with the words with an even parity (see Table 2). Then, we have 16<sup>16</sup> candidates to represent a 4-bit S-box. We must find a way to compare one candidate from another.

Only the S-boxes that correspond to permutations are considered (each output has one and only one related input). Indeed, the parity code used is the 5-bit parity code C = [5, 4, 2], but, as we want to consider this code at the state level, the resulting code C = [80, 64, 2] is selected. C is only a concatenation of 16 codes C. This concatenation brings the constrains of the permutations on the S-boxes.


**Table 2.** 5-bit S-box derived from PRESENT to fill.

### *5.1. Score Function*

To compare the candidates, a score to the S-boxes must be attributed and the best score among the candidates is selected. In this work, a focus on the implementation cost is realized. Then, the score of a candidate is the number of logic gates needed to construct the S-box. The algebraic normal form (ANF) of the S-box is used to count the number of AND and XOR gates. With the score function presented in Algorithm 10, the best representation on 5 bits is the S-box with the lowest number of logical gates. The next subsection is the application of the score function to every representation on 5 bits of a 4-bit S-box.

**Algorithm 10** Score of a S-box S.

```
Require: S-box S
Ensure: Score of S (number of logical gates in the ANF)
  function score(S)
     anf ← ANF(S)
     return count( & ) + count(∧)
```
### *5.2. Exhaustive List*

To fill the 5-bit S-box, a candidate must be selected among all the 16! permutations. The obvious way to choose the best S-box is to score every candidate and to keep the one with the lowest score. This process is summarized in Algorithm 11 and is the most precise way to find the lowest score. Indeed, we would have the score of each function and then select the best one according to the criteria of implementation cost. However, it requires us to browse all of the 16! permutations, and this can be a very long task. A new solution based on the construction of the 5-bit representation can be as efficient and very easier to achieve.


**Require:** List of all the 5-bit permutations derived from a 4-bit S-box PermutationLIST **Ensure:** Permutation with the lowest score and its score

```
function score_selection(PermutationLIST)
```

```
low_score ← 1000
for S ∈ PermutationLIST do
   s ← score(S)
   if s < low_score then
      low_score ← s
      selected ← S
return low_score, selected
```
### *5.3. Construction*

A new selection method is introduced with a construction approach instead of an exhaustive approach. In this paragraph, an *even* word denotes a word that verifies the parity characteristic, and an *odd* word is one which does not. Every even word is only 1 bit away from an odd word. The LSB is used to separate an even from an odd word (0x18 and 0x19 are only 1 bit away from each other, and this bit is the LSB). Each even input is substituted by an even output, and each odd input is substituted by an odd output. The 5-bit S-box is constructed with the following rule: an odd input is substituted by the odd word 1-bit away from the even output linked to the even input 1-bit away from the odd input. Indeed, each even pair of input/output have a 1-bit away odd pair of input/output. This construction is explained in the Algorithm 12. With this method, the representation of PRESENT is shown in Table 3 and consists of 62 logical gates. Several S-Boxes (found with an exhaustive search) with good cryptographic properties were tested, and none has an ANF constructed with less than 94 logical gates (the biggest one was created with 124 logical gates). We now have to test the robustness of the protected cipher.

**Algorithm 12** Construction of a code abiding 5-bit representation from a 4-bit S-box.

**Require:** S-box S half-filled **Ensure:** S-box S full-filled **for** *i* in range(32) **do if** *i* is *odd* **then** *S*[*i*] ← *S*[*i* ⊕ 1] ⊕ 1

**Table 3.** 5-bit code abiding representation constructed from PRESENT.


### **6. Experimental Results**

This section presents the various tests done on the protected LED to determine its robustness against fault injection.

### *6.1. Robustness*

To test the robustness of the protected LED cipher, three scenarios are tested. The detection of a fault injected simply sets a variable flag to 0. During the tests, the fault injection is simulated, so there is no case where a fault does not create an error.

**Scenario 1**: A bit of the state is toggled at a random place of the state and at a random moment of the encryption. This bit-flip induces a change on the parity characteristic of the nibble where it belongs. With the code abiding properties of the functions used during the encryption, the error persists until the parity check function and thus is always detected. **Scenario 2**: A bit of the key or of the constant of the AddConstant function is toggled at a random place and a random round of the encryption. As the XOR operation is a code abiding operation, the fault is transmitted from the constant to the state and persists until

**Scenario 3**: A fault is injected on data used more than once during a function at a random place and a random round of the encryption. The copies done before the use of the data are then not equal, and the test sets the flag to 0. The fault is thus always detected.

In all the scenarios, the fault is always detected, and then the code abiding solution is robust against 1-bit fault attack.

In all the scenarios, 1,000,000 faults have been injected, and the countermeasure (combining code abiding property and copies of the elements used more than once) always leads to a fault detection. The code abiding solution is then robust against 1-word fault attack. The results are presented in Table 4.

**Table 4.** Robustness results of the secured implementation.

the parity check. The error is thus always detected.


### *6.2. Overcost of the Countermeasure*

Adding the parity scheme to the LED cipher has a cost. Indeed, we convert an encryption algorithm working on 4-bit words to an encryption algorithm working on 5-bit words. Thus, the new round functions have a bigger price than the former ones. Moreover, our *n* states are 80-bit long instead of 64-bit long (we encrypt *n* plaintexts in parallel, and in the tests, we fix *n* = 64). Thus, our implementation takes a bigger place in the memory and one secure encryption takes longer than an unprotected encryption. We differentiate several implementations: the *classical* implementation refers to the soft implementation using lookup tables; the *bitslice* is the bitslice version of LED without any protection; the *code abiding* implementation is the addition of the parity bit during the encryption; and the *code abiding + copies* implementation combines the code abiding properties with copies of values used more than once. The cost can be summarized in the Table 5. The compiler used was the GNU GCC Compiler without any optimization. The CPU used is the Intel Core i5 CPU. The results are presented as a ratio to have a better understanding of the overcost of countermeasures from one implementation to another. The results a must be put into perspective as the classical implementation encrypts only one plaintext at the time when the other implementations can encrypt up to 64 plaintexts at the same time (on a 64-bit length machine). The overcost of the code abiding countermeasure is then better than expected. Indeed, in terms of time overcost, a 25% rise was expected (25% more bits are computed) when an only 12% is measured. However, with the copies countermeasure, an overcost of 79% is reached.


**Table 5.** Implementation results and cost comparison of the encryptions.

Moreover, another comparison on each round function allows us to precisely understand where the countermeasure has the biggest impact (see Table 6). The heaviest functions from the classical implementation to the other ones are clearly the subCells, as the function does not use any lookup table and the addConstant as the constant used must be transformed into a bitslice and parity constant. However, as mentioned before, it is more interesting to compare the bitslice versions as they encrypt the same number of plaintexts and are based on the same principles. With these comparisons, the biggest overcost is the mixColumns function with all the copies brought.


**Table 6.** Implementation results and cost comparison of the round functions.

### **7. Conclusions**

The principle used in this work to prevent fault injections is to detect them using an error detecting code, the parity bit code. This code relies on a redundancy of the information contained in a word. The parity bit code used is the 5-bit parity code, with 4 data bits and 1 parity bit. This method allows us to detect a 1-bit fault injection on a value during an operation.

This work lightens an issue induced by an error compensation. Indeed, depending on the operation performed, an error injected on a value can be propagated into several computed outputs and with the parity bit code, and this error may compensate with its multiple occurrences. The first step is then to present the conditions on the fault and on the operation to reach the compensation, and then to propose a countermeasure to this error compensation that lies on copying the values used more than once and check for equality of the copies.

In addition to this first measure, a method is presented to apply code-abiding notion to word-oriented ciphers. An example on the LED cipher shows the transformations of the state and the round functions to include the parity bit code to the operations. A protected version of the existing LED cipher is then created. Its robustness against 1-bit fault injection is tested, and the results validate its security. Moreover, with the bitslice method, the robustness reaches 1-word fault injection detection.

The next step is to extend this method to a generic one to include code abiding to new cryptographic primitives. A critical operation is the S-box used, and the projection of this S-box into a larger space to add the parity bit brings many candidates. A way to differentiate them is to give them a score based on their implementation cost and select the cheapest S-box.

Eventually, future works could focus on applying the code-abiding method to a larger cipher, such as AES, rather than lightweight ciphers, as well as to evaluate the overcost of the countermeasure compared to other error detecting solutions. Moreover, 1-bit error detection has its limitations [14], and a work on multiple faults detection and correction would be interesting.

**Author Contributions:** Conceptualization, P.-A.T. and V.G.; methodology, P.-A.T.; software, P.-A.T.; validation, P.-A.T.; investigation, P.-A.T.; writing—original draft preparation, P.-A.T.; writing—review and editing, P.-A.T., L.B. and V.G.; visualization, P.-A.T.; supervision, L.B. and V.G.; project administration, L.B.; funding acquisition, L.B. All authors have read and agreed to the published version of the manuscript.

**Funding:** Part of this was support by the French Agence Nationale de la Recherche under the grant ANR-22-CE39-0008 (project PROPHY).

**Institutional Review Board Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **Abbreviations**

The following abbreviations are used in this manuscript:

CA Code Abiding CA + Copies Code Abiding with copies included

### **References**


**Disclaimer/Publisher's Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

### *Communication*
