**Trustworthiness in Mobile Cyber Physical Systems**

Editors

**Kyungtae Kang Junggab Son Hyo-Joong Suh**

MDPI • Basel • Beijing • Wuhan • Barcelona • Belgrade • Manchester • Tokyo • Cluj • Tianjin

*Editors* Kyungtae Kang Hanyang University Korea

Junggab Son Kennesaw State University USA

Hyo-Joong Suh The Catholic University of Korea Korea

*Editorial Office* MDPI St. Alban-Anlage 66 4052 Basel, Switzerland

This is a reprint of articles from the Special Issue published online in the open access journal *Applied Sciences* (ISSN 2076-3417) (available at: https://www.mdpi.com/journal/applsci/special issues/Trustworthiness Mobile CPS).

For citation purposes, cite each article independently as indicated on the article page online and as indicated below:

LastName, A.A.; LastName, B.B.; LastName, C.C. Article Title. *Journal Name* **Year**, *Volume Number*, Page Range.

**ISBN 978-3-0365-1086-6 (Hbk) ISBN 978-3-0365-1087-3 (PDF)**

© 2021 by the authors. Articles in this book are Open Access and distributed under the Creative Commons Attribution (CC BY) license, which allows users to download, copy and build upon published articles, as long as the author and publisher are properly credited, which ensures maximum dissemination and a wider impact of our publications.

The book as a whole is distributed by MDPI under the terms and conditions of the Creative Commons license CC BY-NC-ND.

## **Contents**


## **About the Editors**

**Hyo-Joong Suh** (Professor) is currently a professor at the School of Computer Science and Information Engineering, the Catholic University of Korea. He received his BS and MS degrees from Seoul National University in 1992 and 1994, respectively. He completed his PhD degree at the Department of Computer Engineering of Seoul National University in 2000. He is an expert in embedded and mobile systems, with extensive experience in scalable computer and wireless/mobile systems. His research extends from memory hierarchy optimization during MS and PhD research to the prototyping of various mobile devices from several communication companies as a professional service. His current research interest focuses on human behavior computing with personal identification using various sensors.

**Junggab Son** (Assistant Professor) is currently an Assistant Professor in the Department of Computer Science, College of Computing and Software Engineering, Kennesaw State University (KSU), Marietta, GA, USA. He was a limited-term Assistant Professor from January to May 2018 and was a research fellow/part-time assistant professor from October 2016 to December 2017 in the Department of Computer Science, KSU. Before joining KSU, he was a postdoctoral research associate in the Department of Mathematics and Physics, North Carolina Central University, Durham, NC, USA, from September 2014 to September 2016. He received his Ph.D. degree (August 2014) and M.S. degree (February 2011) in computer science and engineering from Hanyang University, Seoul, South Korea. He received his B.S. degree (February 2009) in computer science and engineering from Hanyang University, Ansan, South Korea. His research interests include applied cryptography, security, and privacy issues in significant applications, including cloud computing (Fog/Edge Computing), the internet of things (Future Internet), vehicular ad hoc networks, social network services, and bioinformatics.

**Kyungtae Kang** (Professor) received a B.S. degree in computer science and engineering, followed by M.S. and Ph.D. degrees in electrical engineering and computer science, from Seoul National University, Seoul, Korea, in 1999, 2001, and 2007, respectively. From 2008 to 2010, he was a postdoctoral research associate at the University of Illinois at Urbana-Champaign, IL, USA. In 2011, he joined the Department of Computer Science and Engineering at Hanyang University, where he is currently a tenured professor. His research interests lie primarily in systems, including operating systems, mobile systems, distributed systems, and real-time embedded systems. His recent research interest is in the interdisciplinary area of cyber-physical systems.

## *Editorial* **Trustworthiness in Mobile Cyber-Physical Systems**

**Hyo-Joong Suh 1, Junggab Son <sup>2</sup> and Kyungtae Kang 3,\***


#### **1. Introduction**

As they continue to become faster and cheaper, devices with enhanced computing and communication capabilities are increasingly incorporated into diverse objects and structures in the physical environment. Harnessing these capabilities will provide the basis for applications offering enormous societal impact and economic benefit, linking the cyber world of computing and communications with the physical world. Such applications are called cyber-physical systems (CPSs). It is evident that as direct interactions between real-world entities (including human activities) and cyber systems become more commonplace, the trustworthiness of such systems will become an increasingly important issue. Here, we use the term system trustworthiness in a broad sense to describe systems that demonstrate reliable functionality and are worthy of user confidence, such that they guarantee continuous service in response to internal errors or external attacks [1].

While CPSs traditionally involve static equipment and stable networks, the development of increasingly pervasive mobile devices has generated considerable attention in mobile CPSs (MCPSs). By exploiting the advantages of CPSs through mobile devices, such as the iPhone and Android phones, with their increasing processing power, range of sensors, and pervasive cellular connections, MCPSs provide expanded applicability, including access to networks comprising multiple mobile devices, such as vehicle networks. Owing to the instability of mobile networks and the variable computing power of individual mobile devices, many studies have been performed to address various aspects supporting the efficient cooperation and performance of MCPSs. In particular, the timeliness of data transferal is essential because delays and failures due to bottlenecks stemming from variable network environments can adversely affect the entire system.

The objective of this Special Issue is to contribute to the advancement of research on a wide variety of topics involved in the development of modern and future trustworthy MCPSs, including design, modeling, verification and validation, dependability, resilience, security, safety, and run-time resource optimization. It is imperative to address the issues that are critical to the mobility of MCPSs, report significant advances in the underlying science, and discuss the challenges facing the development and implementation of specific MCPS applications, including those associated with aerospace, autonomous automotive systems, automatic pilot avionics, smart grids, and distributed robotics. Such applications will empower the true vision of MCPSs, driving the evolution of human interactions with the physical world. Moreover, technologies utilizing CPSs will emerge as key drivers in the development of a future autonomous and smart-connected world.

As a side note, we focus on methods for integrating MCPSs with artificial intelligence (AI) without compromising the trustworthiness of the system. AI-enabled CPSs combine computational capabilities with the ability to control and sense physical space. For example, the behavior of autonomous CPSs, such as self-driving cars and autonomous drones in open environments is often determined by AI and machine learning algorithms. However, the use of data-driven deep learning techniques for perception and control in autonomous

**Citation:** Suh, H.-J.; Son, J.; Kang, K. Trustworthiness in Mobile Cyber-Physical Systems. *Appl. Sci.* **2021**, *11* , 1676. https://doi.org/ 10.3390/app11041676

Received: 8 February 2021 Accepted: 9 February 2021 Published: 13 February 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

CPSs has raised concerns regarding the safety and robustness of autonomous systems. When operating in a physical environment, the unexpected action of AI-enabled CPSs can inflict critical damage on the surrounding environment, including the potential endangerment of humans. Therefore, AI-enabled MCPSs should satisfy stringent regulations regarding their trustworthiness. Although sophisticated testing plays an important role in ensuring the safety and robustness of such systems, the complexity of modern autonomous CPSs means that evaluating trustworthiness via testing alone is insufficient. Formal verification reduces the burden on the testing process by ruling out large classes of errant behaviors at the design stage. Nevertheless, the introduction of a standard methodology for developing formal methods for autonomous AI-enabled CPSs is essential.

#### **2. Review of Issue Contents**

This Special Issue presents nine original papers covering the latest advances and technologies involved in the design of reliable, resilient, secure, and intelligent MCPSs. Moreover, each paper contributes research that offers insights regarding trustworthiness in MCPSs.

Artificial intelligence models, especially deep neural networks such as convolutional neural nets (CNNs), tend to have many learning parameters, thus making their integration into small embedded CPSs, such as mobile phones, challenging. In response to this issue, Lee et al. in [2] suggested a new model compression framework based on sparse coding and knowledge distillation with adversarial training, thereby producing compact CNN architectures that maintain robustness against adversarial perturbed inputs. Furthermore, the authors provide training algorithms based on the alternating direction method of multipliers (ADMM), which is more memory-efficient than existing CNN pruning methods and, therefore, more suitable for AI-enabled MCPSs.

In [3], Kim et al. propose two novel data quality measures suitable for large-scale highdimensional data. As low-quality data can degrade prediction accuracy and inference bias, measuring the data quality is an important first step in successful AI applications. In MCPS, the use of AI often requires regular updates, while detecting inference bias when operating at the AI runtime is difficult; therefore, a data quality check is essential. This study also proposes efficient algorithms based on random projections and bootstrapping, enabling the suggested measures to be computed for large-scale and high-dimensional data, thus representing a departure from existing data quality measures.

Automotive systems are typical examples of CPSs in which embedded software is the main element controlling the mechanical components of the vehicle. Internet-connected software components can be victims of security attacks at any time, and CAN (Controller Area Networks), an in-vehicle network system connecting individual electronic control units (ECUs), serves as a breach point to break vehicle safety.

MAuth-CAN [4] is a new CAN authentication technique that protects ECUs from attacking messages based on a centralized node called an authenticator. It is secure against masquerade attacks by a compromised node and protects the authenticator node from busoff attacks that can temporarily force an ECU to leave CAN. However, the use of a central node causes an additional authentication delay. Thus, in accordance with regulations such as ISO 26262, the efficacy of the MAuth-CAN must be formally verified before it can be used for commercial vehicles.

Cho et al. [5] present formal proof that MAuth-CAN is consistently resistant against message flooding and Bus-Off attacks and provide formal CAN models at various levels, which can be used to analyze CAN applications. Via model checking, the complicated behavior of CAN in the media access control level of the data link layer connecting to MAauth-CAN was checked exhaustively to prove its resilience and sustainability under such attacks. These results can be used to obtain safety certificates from regulatory authorities, while the methodology and the CAN models can be used to secure safety certificates regarding CAN applications.

Public key encryption with keyword search (PEKS) functionality enables users to search for encrypted data that has been outsourced to an untrusted server. Unfortunately, updates to the outsourced data may cause information leakage by exploiting the queries previously submitted in PEKS. Yoon et al. [6] address this by proposing a novel forward private PEKS scheme based on software guard extension (SGX), a trusted execution environment provided by Intel. By utilizing SGX, the proposed scheme presents substantial performance improvements compared with prior work. Owing to the readiness with which a trusted platform such as SGX can be integrated with many current CPSs, this research also has implications for security enhancements in CPS environments.

Event-based systems (EBSs) are prevalent in MCPS applications owing to their communication model, which uses implicit invocation and concurrency between components. However, the non-determinism of EBSs during event processing can introduce inherent security vulnerabilities into the system. Many types of attack can incapacitate and/or damage a target EBS by exploiting this event-based communication model. To minimize the security risks to EBSs, the security flaws of such systems, the relationships between these flaws, and feasible techniques for dealing with each flaw must be determined. However, existing security flaw taxonomies do not appropriately reflect the inherent security issues of EBSs. Therefore, Lee et al. [7] introduced a new taxonomy that defines and classifies the inherent security flaws of EBSs, which can serve as a basis for resolving its specific security problems. Moreover, the authors correlated their taxonomy with security attacks designed to target specific flaws and identified existing solutions for the prevention of such attacks.

In [8], Ali et al. describe an energy minimization technique for mixed-criticality realtime scheduling on a single-core system. The main contribution of the proposed technique is that it allows the processor frequency to be controlled dynamically depending on the system criticality mode. Through a series of simulations, they demonstrated and analyzed the effects caused by both low-and high-criticality modes in power-aware mixed-criticality systems. As safety and power awareness are both issues for MCPSs, this study offers valuable insights for power-aware safety-critical CPSs.

Safety and efficiency provide the focus in [9], in which Kwon et al. propose a system that dynamically controls the all-red signal length based on the driving characteristics of vehicles identified as red-light runners (RLRs) to improve the overall safety and efficiency of intersections in road networks. The proposed system uses a multi-channel deep convolutional neural network (MC-DCNN) to enable the online detection and classification of RLRs, which can be defined using clustering results acquired via dynamic time wrapping (DTW) and hierarchical clustering analysis (HCA). For dynamic all-red signal control, the proposed system uses a multi-level regression model to estimate the necessary all-red signal extension time more accurately, thereby improving the overall safety for intersection traffic as well as efficiency of the traffic flow.

By contrast, the study conducted by Oh et al. [10] concerns real-time data transmission to mobile equipment used by groups of workers, termed a mobile sink group (MSG), for which rapid and reliable data are vital to ensure the efficient operation of groups working on collaborative projects, which often involve multiple pieces of equipment where miscommunication could result in an industrial accident. The authors proposed a real-time data delivery mechanism based on a virtual grid structure to support MSGs. The main idea is to determine the farthest distance and calculate the minimum real-time data transmission speed required.

First, the proposed scheme models the MSG as a single center point and radius, and defines the end-to-end distance based on the member sink located furthest from the source node. Thus, the source node can calculate the transmission speed, which is maintained during the data transmission. The data transmission process is divided into two main phases: the main forwarding phase, which passes through the center of the mobile sinks from the source node, and the branch forwarding phase at the branch point, which receives data via the main forwarding phase. In addition, even if some mobile sinks deviate from the initial radius owing to environmental factors associated with MCPSs, the connection of the sinks is ensured through the inner/outer agent concept. Thus, the proposed scheme can deliver data to all member sinks in a timely manner and is superior to existing schemes in terms of real-time communication for MSGs.

Finally, in [11], Choi et al. address an important system optimization problem faced by automotive control systems. More specifically, a control application based on AUTOSAR (AUTomotive Open System Architecture) [12] is assumed, whereby fine granular schedule entities (i.e., runnables) are used to compose a control application. For this purpose, the authors propose a Lagrange multiplier-based runnable period optimization method that maximizes the level of system control, which is useful for the development of future MCPSs, where design optimization is a fundamental consideration.

#### **3. Conclusions**

This Special Issue presents new and innovative research addressing some of the many scientific challenges associated with improving the trustworthiness of MCPSs. We emphasize the need for a better understanding of the security and reliability of MCPS as well as the impacts of AI, and demonstrate procedures for solving the adverse effects caused by these impacts. As such, the studies contained within this volume provide a valuable basis for the protection and promotion of resilient MCPSs.

**Author Contributions:** Conceptualization, H.-J.S., J.S. and K.K.; methodology, H.-J.S. and K.K.; validation, J.S.; investigation, K.K.; writing—original draft preparation, K.K.; writing—review and editing, H.-J.S.; supervision, J.S. and K.K.; funding acquisition, H.-J.S. and K.K. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was supported by the National Research Foundation of Korea (2016R1D 1A1B01006716) and the Catholic University of Korea, Research Fund, 2020. This research was also supported by the Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korean government (Ministry of Science and ICT) (No.2020-0-01343, Artificial Intelligence Convergence Research Center (Hanyang University ERICA)).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Acknowledgments:** This issue would not have been possible without the help of a variety of talented authors, professional reviewers, and the dedicated editorial team of Applied Sciences. First, we express our gratitude to the authors for their excellent contributions to this Special Issue on trustworthiness in mobile cyber-physical systems. We are also grateful to all the reviewers for their time and effort in examining these papers, and for their valuable comments and constructive suggestions. Finally, we appreciate the advice and support of the editorial team of Applied Sciences for their help in the publication process. We hope that this Special Issue will serve as a valuable reference for academicians, scientists, engineers, and practitioners working toward the design and implementation of trustworthy mobile cyber-physical systems.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Robust CNN Compression Framework for Security-Sensitive Embedded Systems**

**Jeonghyun Lee and Sangkyun Lee \***

School of Cybersecurity, Korea University, Seoul 02841, Korea; nomar0107@korea.ac.kr

**\*** Correspondence: sangkyun@korea.ac.kr

**Abstract:** Convolutional neural networks (CNNs) have achieved tremendous success in solving complex classification problems. Motivated by this success, there have been proposed various compression methods for downsizing the CNNs to deploy them on resource-constrained embedded systems. However, a new type of vulnerability of compressed CNNs known as the adversarial examples has been discovered recently, which is critical for security-sensitive systems because the adversarial examples can cause malfunction of CNNs and can be crafted easily in many cases. In this paper, we proposed a compression framework to produce compressed CNNs robust against such adversarial examples. To achieve the goal, our framework uses both pruning and knowledge distillation with adversarial training. We formulate our framework as an optimization problem and provide a solution algorithm based on the proximal gradient method, which is more memoryefficient than the popular ADMM-based compression approaches. In experiments, we show that our framework can improve the trade-off between adversarial robustness and compression rate compared to the existing state-of-the-art adversarial pruning approach.

**Keywords:** model compression; adversarial robustness; weight pruning; adversarial training; distillation; embedded system; secure AI

**Citation:** Lee, J.; Lee, S. Robust CNN Compression Framework for Security-Sensitive Embedded Systems. *Appl. Sci.* **2021**, *11* , 1093. https://doi.org/10.3390/app11031093

Academic Editor: Kyungtae Kang Received: 31 October 2020 Accepted: 22 January 2021 Published: 25 January 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional clai-ms in published maps and institutio-nal affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

#### **1. Introduction**

In the past few years, convolutional neural networks (CNNs) have achieved great success in many applications including image classification and object detection. Despite the success, the excessively large amount of learning parameters and the vulnerability for the adversarial examples [1–8] are making it difficult to deploy CNNs especially on resource-constrained environments such as smartphones, automobiles, and wearable devices. To overcome this drawback, various model compression methods have been proposed, where many are based on weight pruning [9–17]. Weight pruning generates sparse learning weights by solving an optimization problem with sparsity constraints on the weights, and then the actual compression is accomplished by removing zero weights from a trained model. Although their approach is quite simple, state-of-the-art weight pruning methods [16,17] achieve a high compression rate with little drop in accuracy.

On the other hand, it has been reported that even the state-of-the-art CNNs are vulnerable to adversarial attacks [1–8]. Adversarial attacks are accomplished by using perturbed inputs which cause misclassification where modification is nearly imperceptibly small. Such perturbation can be easily produced by exploiting the gradient information of the target neural network [1,4,6]. Furthermore, some works show that adversary can even generate adversarial examples without knowing anything about the target neural network [5]. Adversarial training [1,6] has been proposed as a countermeasure to adversarial attacks bringing robustness to neural networks against adversarial inputs. This method trains a classifier not only with training examples but also with adversarial examples generated actively by the defender for known types of adversarial perturbations. In particular, projected gradient descent attack [6]-based adversarial training is known to provide high robustness

against the first-order adversary [1,4,6]. However, it has been shown that adversarial training requires a significantly large capacity of the neural network to achieve high accuracy on both original and adversarial examples [6].

Recently, the vulnerability of the compressed neural network is raised as an issue [18]. As shown in Madry et al. [6] the adversarial robustness of compressed neural networks is hard to achieve due to the lack of its architectural capacity. This prevents the compressed neural network from being deployed to a trust-sensitive domain. Despite the seriousness of this problem, only a few methods have been proposed [19,20]. One notable technique is to consider adversarial robustness and model compression at the same time. Ye et al. [19] and Gui et al. [20] formulated an optimization problem by combining adversarial training with pruning and solved it with the alternating direction method of multiplier (ADMM) framework. These works demonstrated that considering weight pruning and adversarial training concurrently can show a better trade-off between robustness and compression rate than considering them separately. However, the ADMM framework requires two auxiliary tensors each of which has the same size as the learning parameters tensor of a CNN: this leads to a heavy memory burden for a resource-constrained environment. In this paper, we show that the joint optimization of pruning and adversarial training can be solved more memory efficiently using the proximal gradient method (PGM) without any auxiliary tensors.

Furthermore, we found that consistently providing information about the pretrained original network during adversarial training can improve the robustness of the resulting compressed network. With this intuition, we propose a novel robust pruning framework that jointly uses pruning and knowledge distillation [21] within the adversarial training procedure. Knowledge distillation is a technique to transfer the information of a network (teacher) to another network (student) by minimizing the gap between the SoftMax outputs of the two networks. In our framework, we use a pretrained original network as the teacher and provide its SoftMax output to a student network being compressed. We summarize our contribution as follows:


#### **2. Related Works**

#### *2.1. Adversarial Attacks*

Adversarial attacks try to find allowable perturbations to change the prediction result of the target network. In the image classification domain, the set of allowable perturbations is generally defined by bounding the *p* norm of perturbation to satisfy an imperceptibility constraint. Such perturbation can be generated by exploiting the information of the target network. According to the amount of this information, adversarial attacks are categorized into the black-box and white-box attacks. A black-box attack assumes a weak adversary who does not have any information about the target model. In this situation, the adversary must rely on query access for chosen input data [5] or the transferability of adversarial examples [2,3]. In a white-box setting, an adversary can access the details of the target model such as the structure, the parameters, the training dataset, etc. Based on the strong

assumption, most white-box attack methods [1,4,6] exploit the first-order information of the target model to generate sophisticated perturbations. In this paper, we focus on the white-box attacks because it is important to study such attacks to implement effective defenses.

#### *2.2. Adversarial Training*

Adversarial training is a simple and intuitive learning strategy to enhance the robustness of a neural network against adversarial attacks. It generates adversarial examples using a first-order white-box attack [1,4,6] while training a neural network so that the network will correctly classify not only the training examples but also the generated examples. Adversarial training with a single-step attack such as the fast gradient sign method (FGSM) [1] is known to suffer from so-called label leaking [23] caused by the correlation between perturbation and true label. To prevent label leaking and to generate strong adversarial examples, Madry et al. [6] proposed projected gradient descent (PGD) attack-based adversarial training.

#### *2.3. Weight Pruning*

Weight pruning is a model compression technique to make unimportant learning weights to the zero value resulting in sparse weights, and thereby to remove redundant connections or components from a neural network. According to the unit of pruning, weight pruning is categorized into element-wise pruning and filter-wise pruning.

In their early stage, pruning methods focused on element-wise pruning that generates irregular sparsity patterns. To set the values of redundant weights to zero, elementwise pruning [9] measures the importance of weights usually by their absolute values. Han et al. [10] showed that this simple pruning process can be effectively combined with weight quantization and Huffman coding to achieve further compression.

Filter-wise pruning is getting more interest since it is more adequate for GPU acceleration as well as compressing convolution filters in CNNs. Some primary works prune the filters of CNN by measuring their importance by -<sup>2</sup> norm [13] or by the number of effects on activation map [12]. Based on these works, several advanced filter pruning methods [14–17,24] have been proposed by varying the ways of measuring the importance of each filter and the composition of the pruning procedure.

#### *2.4. Knowledge Distillation*

The main idea of the knowledge distillation [21] is to transfer the knowledge of a trained teacher network to a student network by training the student network using the input and the SoftMax output of the teacher. In the early stage, it is usually applied for model compression and achieved by transferring the knowledge of an over-parameterized teacher model to a smaller student model. Bucila et al. [25] primarily used this strategy with unlabeled synthesized data to transfer the knowledge of a large ensemble teacher. Hinton et al. [21] formally defined the knowledge distillation loss with temperature and showed that distillation is effective for transferring knowledge with the original training dataset.

Distillation also can be used as a defense to adversarial examples. The defensive distillation [22] achieves adversarial robustness by applying distillation on student and teacher models which have the same structure. However, it has been shown that the defense can be easily broken [4].

Many methods have been proposed to improve the effectiveness of distillation. Distillation with boundary support samples [26] tries to improve the generalization performance of a student model by conducting the distillation with the adversarial examples near the decision boundary. Distillation with teacher assistant [27] fills the gap between student and teacher models by using intermediate models called teacher assistants.

#### *2.5. Adversarially Robust Model Compression*

To preserve the robustness of the compressed model, adversarial pruning can be applied in most cases which combines the ideas of adversarial training and pruning. Ye et al. [19] and Gui et al. [20] formulated an objective which includes both adversarial training and sparsity constraints, and showed that applying adversarial training and pruning concurrently generated better robustness than applying them separately. Xie et al. [28] used blind adversarial training [29] during adversarial pruning which generated adversarial examples dynamically during adversarial training to reduce the sensitivity to the budget of adversarial examples. Madaan et al. [30] proposed a new pruning criterion to reduce the vulnerability of latent space represented by the difference between the activation map of adversarial example and its original input.

Some works also considered the adversarial robustness of different types of compression to pruning. Bernhard et al. [31] observed that the change of adversarial robustness according to the different levels of quantization. Lin et al. [32] proposed a defensive quantization method that reduced the sensitivity to the input of the neural network. Goldblum et al. [33] used knowledge distillation to transfer the robustness of an over-parameterized model to a predefined smaller model.

#### **3. Methods**

The main objective of our suggested method is to preserve the adversarial robustness of CNNs during the pruning procedure. An adversarially robust CNN should demonstrate high generalization performance on both original and adversarial inputs. One existing approach to generate such a CNN is adversarial pruning, which is the combination of adversarial training and pruning. However, adversarial pruning alone is not enough to achieve the goal since the decision boundary of the original network is quickly collapsed during the initial stage of the pruning procedure due to the decrease of network capacity, which results in a large decrease in generalization performance on the original inputs. To solve this problem, we propose a novel robust pruning framework that combines adversarial pruning with knowledge distillation. Using the combination, we can provide information of the decision boundary of the original network consistently during adversarial pruning.

In this section, we first describe our definition of the adversary, and then formulate our entire framework as a single optimization problem showing that it can be solved efficiently by the proximal gradient method without using any auxiliary tensors.

#### *3.1. The Attack Model*

Before describing our proposed method, we first elaborate on the attack model. For the purpose, let us define the SoftMax output of a CNN with weight parameter *<sup>w</sup>* <sup>∈</sup> <sup>R</sup>*<sup>p</sup>* as *<sup>f</sup>*(·; *<sup>w</sup>*). Let the data pairs {(*xi*, *yi*)}*<sup>n</sup> <sup>i</sup>*=<sup>1</sup> be a training dataset. Here, *xi* <sup>∈</sup> <sup>R</sup>*<sup>d</sup>* is an input and *yi* ∈ {0, 1}*<sup>k</sup>* is the corresponding one-hot encoded true label. Then, the training procedure of CNN can be described as the following optimization problem.

$$\log^\* \in \operatorname\*{arg\,min}\_{w \in \mathbb{R}^p} \frac{1}{n} \sum\_{i=1}^n \mathcal{L}(f(x\_i; w), y\_i). \tag{1}$$

Here, L is the cross-entropy loss [34] that indicates the gap between the SoftMax output and the true label. For the given discrete probability distribution *p* and *q*, the cross-entropy loss is defined as follows:

$$\mathcal{L}(q, p) = -\sum\_{k} p\_k \log q\_k.$$

The objective of the adversary is changing the prediction result of the trained CNN by adding an imperceptible perturbation on the input image, which can be generated by both targeted attack and untargeted attack. In the targeted attack, the adversary generates perturbation that minimizes the cross-entropy between the SoftMax output and the predefined target label that is different from the true label. Given input data pair (*x*, *y*) and target label *yt*, the targeted attack can be described as follows:

$$\min\_{\delta \in \Lambda} \mathcal{L}(f(x + \delta; w^\*), y\_\ell), \quad \text{such that } y\_\ell \neq y. \tag{2}$$

Since the effectiveness of the targeted adversarial attack varies depending on the chosen target label, most robust pruning literature [19,20,30] focus on the untargeted attack for experimenting with adversarial examples, and we take the same approach. In untargeted adversarial attack, we generate adversarial examples by maximizing the cross-entropy between the SoftMax output and the true label:

$$\max\_{\delta \in \Lambda} \mathcal{L}(f(\mathbf{x} + \delta; w^\*), y). \tag{3}$$

Also, we suppose a white-box setting where the adversary has full knowledge about the target CNN. In this case, the adversary can solve (2) and (3) by exploiting the gradient of the target CNN.

#### *3.2. Adversarial Pruning with Distillation*

Adversarial training is a type of robust optimization procedure which can be stated by the following min-max problem:

$$w\_{den}^{\*} \in \arg\min\_{w \in \mathbb{R}^p} \frac{1}{n} \sum\_{i=1}^n \max\_{\delta \in \Lambda} \mathcal{L}(f(x\_i + \delta; w), y\_i). \tag{4}$$

To solve the inner maximization problem of (4), we consider the projected gradient descent (PGD) attack method [6] with an -<sup>∞</sup>-norm feasible set. For a given data pair (*x*, *y*), the PGD attack is defined as follows:

$$\mathbf{x}^{t+1} = \Pi\_{\mathcal{B}(\mathbf{x}, \boldsymbol{\varepsilon})} (\mathbf{x}^t + \mathbf{a} \cdot \text{sgn}(\nabla\_{\mathbf{x}^t} \mathcal{L}(f(\mathbf{x}^t + \boldsymbol{\delta}; \mathbf{w}), \mathbf{y}))). \tag{5}$$

Here, <sup>Π</sup>B(*x*,) is a projection operation to the -∞-norm ball around *x* defined as B(*x*, ) := {*x* + *δ* : *δ*<sup>∞</sup> ≤ }. Let us note that uniformly distributed random noise is added to *x* in the initial stage of the PGD attack to prevent the label leaking problem [23]. The solution of (4) which we denote as *w*∗ *den* is generally non-sparse since there is no sparse constraint on this optimization problem. By adding a sparse regularization term to (4), we can obtain the objective of adversarial pruning,

$$w\_{\text{spar}}^{\*} \in \arg\min\_{w \in \mathbb{R}^{p}} \frac{1}{n} \sum\_{i=1}^{n} \max\_{\delta \in \Lambda} \mathcal{L}(f(x\_{i} + \delta; w), y\_{i}) + \lambda \|w\|\_{0\prime} \tag{6}$$

where *λ* > 0 is a hyperparameter to control the sparsity of *w*.

Generally, the solution of (1), denoted by *w*∗, is used as initial weights for solving (6). Here, our question is how we effectively preserve the accuracy of *w*∗ on original inputs during adversarial pruning procedure. The accuracy on the original inputs is largely dropped during the adversarial pruning procedure since the one-hot encoded label *yi* in (6) does not contain any information about the decision boundary of *w*∗.

To consistently provide the information of *w*∗ during pruning, we combine the knowledge distillation idea with adversarial pruning. In our method, the pretrained network works as a teacher and provides SoftMax output *f <sup>t</sup>* (·; *w*∗) on original input during adversarial pruning procedure. The proposed objective is formulated as follows:

$$\min\_{\mathbf{w}\in\mathbb{R}^{\mathcal{F}}}\frac{1}{n}\sum\_{i=1}^{n}(1-a)\mathcal{L}(f(\mathbf{x}\_{i}+\delta;\mathbf{w}),y\_{i}) + at^{2}\mathcal{L}(f^{t}(\mathbf{x}\_{i}+\delta;\mathbf{w}),f^{t}(\mathbf{x}\_{i};\mathbf{w}^{\*})) + \lambda\|\|w\|\|.\tag{7}$$

Here, *δ* is the solution of (3) and *t* is a distillation hyperparameter [21]. The *t* <sup>2</sup> is multiplied in front of the second term to prevent the shirking of gradient problem [21]. The second term in (7) is distillation loss which indicates the cross-entropy between SoftMax output of the currently pruned model *f*(·; *w*) and the teacher model *f*(·; *w*∗). The overall formulation of (7) can be interpreted as the linear combination of the adversarial pruning loss (6) and the distillation loss. By solving (7), we can obtain a sparse but robust solution that approximates the decision boundary of *w*∗. Our framework can be extended for filter pruning by replacing the third regularizer term with the number of non-zero filters as follows:

$$\begin{aligned} \min\_{\boldsymbol{w}\in\mathbb{R}^{p}} \frac{1}{n} \sum\_{i=1}^{n} (1-\boldsymbol{a}) \mathcal{L}(f(\boldsymbol{x}\_{i}+\boldsymbol{\delta};\boldsymbol{w}), \boldsymbol{y}\_{i}) + \boldsymbol{a}\boldsymbol{t}^{2} \mathcal{L}(f^{t}(\boldsymbol{x}\_{i}+\boldsymbol{\delta};\boldsymbol{w}), f^{t}(\boldsymbol{x}\_{i};\boldsymbol{w}^{\*})) \\ &+ \lambda \sum\_{\boldsymbol{\xi}=1}^{G} \mathbf{1}\left[\|\boldsymbol{w}\_{\boldsymbol{\xi}}\|\_{2} \neq 0\right]. \end{aligned}$$

Here, *G* is the number of filters and *wg* is the weight vector of *g*th filter.

#### *3.3. Optimization*

Most of the adversarial pruning approaches use the alternative direction method of multiplier (ADMM) method to solve the resulting optimization problem, for example, Ye et al. [19] and Gui et al. [20]. However, by construction, the ADMM requires using two additional tensors to the learning weights during optimization, which can be preventive on a resource-constrained environment with limited memory. Here, we suggest another algorithm based on the proximal gradient method to solve our proposed optimization problem (7) which does not require such auxiliary tensors. For simplicity, we denote the linear combination of two cross-entropy loss in (7) by LAPD:

$$\mathcal{L}\_{\text{APD}}(w) = \frac{1}{n} \sum\_{i=1}^{n} (1 - \mathfrak{a}) \mathcal{L}(f(\mathbf{x}\_{i} + \delta; w), y\_{i}) + \mathfrak{a}\mathfrak{t}^{2} \mathcal{L}(f^{t}(\mathbf{x}\_{i} + \delta; w), f^{t}(\mathbf{x}\_{i} + \delta; w^{\*})). \tag{8}$$

Here, APD stands for adversarial pruning with distillation. Then we can rewrite (7) as

$$\min\_{w \in \mathbb{R}^p} \mathcal{L}\_{\text{APD}}(w) + \lambda \|w\|\_0. \tag{9}$$

By applying a second order Taylor approximation on *wk* and Hessian approximation with <sup>∇</sup>2L*apd*(*wk*) <sup>≈</sup> <sup>1</sup> *ηk Ip*×*<sup>p</sup>* for a *η<sup>k</sup>* > 0 to (9), we obtain the following formulation:

$$\mathcal{L}\_{\text{APD}}(w) \approx \mathcal{L}\_{\text{APD}}(w\_k) + \nabla \mathcal{L}\_{\text{APD}}(w\_k)^\top (w - w\_k) + \frac{1}{2\eta\_k} ||w - w\_k||^2.$$

Here, *Ip*×*<sup>p</sup>* indicates the identity matrix where the shape is *p* × *p*. Based on this successive approximation result, the weight update can be formulated as follows:

$$\|w\_{k+1} - \underset{w \in \mathbb{R}^p}{\operatorname{arg\,min}} \mathcal{L}\_{\text{APD}}(w\_k) + \nabla \mathcal{L}\_{\text{APD}}(w\_k)^\top (w - w\_k) + \frac{1}{2\eta\_k} \|w - w\_k\|^2 + \lambda \|w\|\_{0-\delta}$$

By removing the redundant parts of the above weight update equation, we can obtain

$$\|w\_{k+1} = \underset{w \in \mathbb{R}^p}{\arg\min} \frac{1}{2\eta\_k} \left(2\eta\_k \nabla \mathcal{L}\_{\text{APD}}(w\_k)^\top w + \|w\|^2 - 2w^\top w\_k\right) + \lambda \|w\|\_0.$$

We can rewrite the above equation as follows:

$$\|w\_{k+1} = \underset{w \in \mathbb{R}^p}{\arg\min} \frac{1}{2\eta\_k} \left( \|w\|^2 - 2w^\top \left(w\_k - \eta\_k \nabla \mathcal{L}\_{\text{APD}}(w\_k) \right) \right) + \lambda \|w\|\_0.$$

By adding a constant *wk* <sup>−</sup> *<sup>η</sup>k*∇LAPD(*wk*)2, we can obtain

$$w\_{k+1} = \underset{w \in \mathbb{R}^{\mathcal{J}}}{\arg\min} \frac{1}{2\eta\_{k}} \left( \|w\|^{2} - 2w^{\top} \left( \left.w\_{k} - \eta\_{k} \nabla \mathcal{L}\_{\text{AFD}}(\boldsymbol{w}\_{k})\right) + \left\|w\_{k} - \eta\_{k} \nabla \mathcal{L}\_{\text{AFD}}(\boldsymbol{w}\_{k})\right\|^{2} \right) + \lambda\|\boldsymbol{w}\|\_{0}.$$

Then, we can get the following equation:

$$w\_{k+1} = \underset{w \in \mathbb{R}^p}{\text{arg min}} \frac{1}{2\eta\_k} \|w - (w\_k - \eta\_k \nabla \mathcal{L}\_{\text{APD}}(w\_k))\|^2 + \lambda \|w\|\_{0-}$$

This is exactly the form of proximal operator which is described as

$$w\_{k+1} = \text{prox}\_{\eta\_k \lambda \|w\|\_{\mathbb{0}}} (w\_k - \eta\_k \nabla \mathcal{L}\_{\text{APD}}(w\_k)).$$

For each element, proximal operator with -<sup>0</sup> regularization term can be computed as

$$(w\_{k+1})\_i = \begin{cases} (w\_k - \eta\_k \nabla \mathcal{L}\_{\text{APD}}(w\_k))\_{i\prime} & |(w\_k - \eta\_k \nabla \mathcal{L}\_{\text{APD}}(w\_k))\_{i\prime}| > \sqrt{\lambda} \\ 0, & |(w\_k - \eta\_k \nabla \mathcal{L}\_{\text{APD}}(w\_k))\_{i\prime}| \le \sqrt{\lambda} \end{cases}$$

It is simply the thresholding operation which sets the updated weight parameter smaller than <sup>√</sup>*<sup>λ</sup>* to zero. Let us note that by controlling the value of *<sup>λ</sup>*, we can explicitly manipulate the sparsity of network. The entire process of our method is described at Algorithm 1.


#### **4. Experiments**

To demonstrate that our method improves the adversarial robustness of the pruned network, we applied our method on three popular CNNs: LeNet [35] with the MNIST dataset, and VGG16 [36], ResNet18 [37] with the CIFAR10 dataset [38]. The MNIST dataset consists of 28 × 28 gray-scaled images with 60,000 trainset and 10,000 testset. The CIFAR10 dataset has 32 × 32 color images with 50,000 trainset and 10,000 testset. As in Han et al. [10], we used the term "compression rate" to indicate the ratio of the number of zeros to the number of entire weight parameters in a CNN. We denoted the test accuracy on the original images as "original accuracy" and the test accuracy on the adversarial images as "adversarial accuracy". As in other literature [19,20,33], we consider that the robustness of the model is improved when both the original accuracy and the adversarial accuracy are improved. Otherwise, we consider a model with a higher mean value of the original and adversarial accuracy to be more robust. Given the time spent on the adversarial training for the large networks, we set the number of iterations of projected gradient descent (PGD) attack to 5 for the adversarial training of VGG16 and ResNet18. In this case, we evaluated the adversarial accuracy on both 10 iterations of PGD attack (denoted by PGD10) and 5 iterations of PGD attack (denoted by PGD5). We followed the parameters of Ye et al. [19] for the rest of the PGD attack parameters, which are strong enough to make the adversarial accuracy of the naturally trained LeNet, VGG16, and ResNet18 close to zero. The implementation of our method is available as open source (https://github.com/JEONGHYUN-LEE/APD).

#### *4.1. The Effect of Knowledge Distillation*

We compared the result of adversarial pruning (denoted by AP) (6) and our method (denoted by APD) (7) to show the effectiveness of the knowledge distillation, for both element-wise pruning and filter pruning. In this comparison, we set the value of *α* in (7) to 1 to maximize the effect of the SoftMax output of the teacher network. Also, we set the temperature *t* of the knowledge distillation to 10 for the MNIST dataset, and 100 for the CIFAR10 dataset for a similar reason.

#### 4.1.1. Element-Wise Pruning

Generally, the element-wise pruning [9,10] can achieve higher sparsity with only a few accuracy drops compared to the filter pruning [11–15]. Therefore, we tested the elementwise pruning on the relatively high compression rates (×2, ×3, ×4) compared to the filter pruning [39]. As in Ye et al. [19], we applied the same sparsity for every convolution layer in the target neural network. For instance, if the compression rate of a given network is determined to ×2, we set the fraction of zero weights in every layer of this network equal to 0.5. With this pruning scheme, we compared the element-wise pruning result of our method (7) with adversarial pruning (6). Both methods were optimized with proximal gradient descent. With this comparison, we demonstrated how much improvement was achieved by the knowledge distillation of our method. The results on MNIST and CIFAR10 are summarized at Tables 1 and 2, respectively.

A popular small network LeNet [35] is enough to achieve a high accuracy on the MNIST dataset. Our baseline LeNet, trained by the original training process achieves the original accuracy of 99.34% and the adversarial accuracy of 0%. With LeNet, our method (APD) showed a large improvement in both original accuracy and adversarial accuracy over the adversarial pruning (AP). In the compression rate of ×2, APD improved the original accuracy by 1.01 and the adversarial accuracy by 2.28% over AP. In the relatively high compression rate of ×3 and ×4, APD achieved a larger improvement in both original accuracy and adversarial accuracy. In particular, the amount of improvement in the adversarial accuracy achieved by APD in the compression rate of ×3 and ×4 was over than 20%. Compared to the baseline performance, APD achieved the compression rate of ×4 with the adversarial accuracy of 94.25% while reducing the original accuracy by about 1%.

We also applied APD and AP to the two CNNs, VGG16 [36] and ResNet18 [37] with the CIFAR10 dataset. Achieving high adversarial robustness on the CIFAR10 dataset is more challenging since it requires a higher architectural capacity of the CNN compared to the MNIST dataset. Our baseline VGG16 achieved the original accuracy of 92.99% and the adversarial accuracy of 0%. Despite the difficulty, APD showed an improvement with VGG16 in the entire compression rates. For instance, in the compression rate of ×4, APD improved the original accuracy by 0.88% and the adversarial accuracy against both PGD5 and PGD10 by more than 1% over AP. Though ResNet18 consists of fewer parameters than VGG16 (11 M vs. 138 M), the generalization performance of Resnet18 for the CIFAR10 dataset is higher than that of VGG16. The baseline ResNet18 showed the original accuracy of 94.40% and the adversarial accuracy of 0.03%. With ResNet18, APD improved the original accuracy and adversarial accuracy against both PGD5 and PGD10 by more than 2% over AP in the entire compression rates. Based on those results, we can conclude that consistently providing the SoftMax output of the baseline CNN with the knowledge distillation improves the adversarial robustness of the element-wise pruning solution.


**Table 1.** Summary of element-wise pruning results of APD (ours) and AP on MNIST.


**Table 2.** Summary of element-wise pruning results of APD (ours) and AP on CIFAR10.

#### 4.1.2. Filter Pruning

The filter pruning [11–15] generates the sparse patterns more adequate for GPU acceleration compared to the element-wise pruning [9,10]. However, the sparsity that the filter pruning can achieve is often lower than that of element-wise pruning [39]. Therefore, we set the smaller compression rates of ×1.5, ×2, and ×2.5 than those of the element-wise pruning. As with element-wise pruning, we set the same sparsity for each convolution layer. We compared our method (APD) with the adversarial pruning (AP) to show the effectiveness of the knowledge distillation on the filter pruning. The results on MNIST and CIFAR10 are summarized at Tables 3 and 4, respectively.

With LeNet, APD improved both original accuracy and adversarial accuracy in the entire compression rates. For instance, in the largest compression rate of ×2.5, APD improves the original accuracy by 0.36% and the adversarial accuracy by 1.44%. The improvement

on the original accuracy tends to be smaller than the improvement on the adversarial accuracy since the original accuracy is already closed to that of the baseline network. APD also showed an improvement in both accuracy measures on the CIFAR10 dataset. With VGG16, APD improved the original accuracy significantly in high compression rate. For instance, in the compression rate of ×2.5, the original accuracy is improved by 5.23%. The adversarial accuracy against both PGD5 and PGD10 attacks is also improved by APD. In the compression rate of ×2.5, the adversarial accuracy increases by 2.09% against PGD5 attacks and 0.6% against PGD10 attacks. With ResNet18, APD also showed a consistent improvement on both original accuracy and adversarial accuracy in the entire compression rates. For instance, in the largest compression rate of ×2.5, APD improves the original accuracy by about 2% and adversarial accuracy by about 1% against both PGD5 and PGD10. Those results imply that the knowledge distillation in our method improves the adversarial robustness of the filter pruning solution.


**Table 3.** Summary of filter-wise pruning results of APD (ours) and AP on MNIST.


**Table 4.** Summary of filter-wise pruning results of APD (ours) and AP on CIFAR10.

#### *4.2. The Convergence Behavior*

To investigate the effect of the knowledge distillation on the convergence behavior of the adversarial pruning, we traced both original accuracy and adversarial accuracy of AP and APD on every epoch. The results on the epoch 0 indicate the initial performance of the currently pruned model where the weight parameters were initialized with the baseline model. We focused on the original accuracy of the early stage of the optimization to show how well APD preserved the original accuracy of the baseline model during the adversarial pruning.

#### 4.2.1. Element-Wise Pruning

We traced both original accuracy and adversarial accuracy of AP and APD with the element-wise pruning scheme in the compression rate of ×2, ×3 and ×4. The results are described at Figure 1. Let us note that the adversarial accuracy is measured against PGD10. APD achieved a significant improvement in the original accuracy in the early stage of optimization with LeNet, VGG16, and ResNet18. With LeNet, the original accuracy of AP fell to lower than 20% on the first epoch whereas the original accuracy of APD was maintained above 90% across the entire optimization process. With VGG16, the original accuracy of both AP and APD was dropped on the first epoch. However, the amount of decrease in the original accuracy on the first epoch of APD was less than that of AP. For instance, in the compression rate of ×4, the original accuracy on the first epoch of APD was higher than that of AP by about 20%. Moreover, with LeNet and VGG16, APD improved the convergence behavior of both original accuracy and adversarial accuracy compared to AP. For instance, in the compression rate of ×3 with VGG16, APD only required 40 epochs for the average value of the original accuracy and the adversarial accuracy to reach 61.00% (the maximum average value achieved by AP), whereas AP required 46 epochs to achieve that. With ResNet18, APD reduced the drop of original accuracy on the first epoch by about 10% across the entire compression rates though the improvement in the convergence behavior of both original accuracy and adversarial accuracy is smaller than that of other networks.

#### 4.2.2. Filter Pruning

We also traced both original accuracy and adversarial accuracy of AP and APD with the filter pruning scheme in the compression rate of ×1.5, ×2, and ×2.5. The results are described at Figure 2. APD improved the overall convergence behavior of the filter pruning. With LeNet, APD reduced the drop of the original accuracy on the first epoch about 5%. With VGG16, the improvement in the first epoch was more significant. For instance, in the compression rate of ×1.5, APD reduced the drop of the original accuracy on the first epoch by about 20%. Mitigating the drop of original accuracy in the first epoch led to an improvement in the overall convergence behavior. For instance, in the compression rate of ×1.5 with LeNet, APD required 49 epochs for the average value of the original accuracy and the adversarial accuracy to reach 96.63% (the maximum average value achieved by AP), whereas AP required 86 epochs to achieve that. In the compression rate ×1.5 with VGG16, APD required 33 epochs for the average value of both accuracies to reach 54.46% (the maximum average value achieved by AP), whereas AP required 59 epochs to achieve that. With ResNet18, APD also reduced the drop of original accuracy in the initial stage of pruning but the amount of improvement decreased in the high compression rate.

#### *4.3. Comparison with the State-of-the-Art Methods*

To show the relative benefit of our method (denoted as APD) compared to other state-of-the-art methods, we also compared APD to Defensive Distillation [22] (denoted as DD), Filter Pruning via Geometric Median [15] (denoted as FPGM), and Ye et al. [19]. The results are summarized at Table 5.

**Figure 1.** The original accuracy and the adversarial accuracy of AP and APD (ours) with respect to the epoch of the element-wise pruning procedure for (**a**) LeNet, (**b**) VGG16, and (**c**) ResNet18. The left of each row is the result in the compression rate of ×2, the middle of each row is the result in the compression rate of ×3, and the right side of each row is the result in compression rate of ×4. The blue line means the original accuracy and the red line indicates the adversarial accuracy. The solid line is the result of APD and the dashed line is the result of AP.


**Table 5.** Summary of filter-wise pruning results of APD (ours) and other state-of-the-art methods.

DD is a well-known defense strategy that generates a robust model by using knowledge distillation. It trains a teacher model with a high temperature value in a modified SoftMax output and then applies knowledge distillation to a student model whose architecture is the same as that of the teacher model. We compared the original accuracy and the adversarial accuracy of APD and DD with LeNet in the compression rate of ×2. For DD, we set the temperature *t* as 40 and the number of epochs as 100. In comparison, APD showed about 6% higher original accuracy and 10% higher adversarial accuracy than DD.

FPGM is a SOTA filter pruning method that effectively prunes the redundant filters by measuring the Geometric Median [40] of each filter. To show that the pruning method only is not enough to generate sparse but robust solutions, we compared our pruned VGG16 with the compression rate of ×1.5 to FPGM's pruned VGG16 with the compression rate of ×1.3. APD showed 26.45% higher adversarial accuracy and 12.12% lower original accuracy compared to FPGM. The mean value of the original and the adversarial accuracy of APD is 61.82 and that of FPGM is 54.65. This result demonstrates that the model generated by the pruning method alone is vulnerable to adversarial attack.

Ye et al. is a SOTA robust pruning method. To solve the adversarial pruning (6) problem using alternative direction method of multipliers (ADMM), the method introduced two additional tensors for auxiliary parameters and Lagrangian multipliers. The size of those two tensors is exactly the same as the size of the weight parameters and therefore, it requires two times more memory than the memory required to store the weight parameters during the optimization procedure. On the other hand, APD solves our optimization problem (7) with the proximal gradient descent, which does not require any auxiliary tensor. We compared the result of APD and Ye et al with LeNet and ResNet18. VGG16 was excluded in this comparison since the exact values of the original accuracy and the adversarial accuracy with VGG16 are not available in the original paper of Ye et al. We set the compression rates to ×2, ×4, and ×8 for LeNet, and ×2 for ResNet18. With LeNet , APD slightly improved both original accuracy and adversarial accuracy over Ye et al. in entire compression rates. With ResNet18, APD improved the original accuracy by 0.26% and the adversarial accuracy by 0.03% compared to Ye et al. The adversarial robustness of APD appears to be similar to that of Ye et al.; however, APD requires far less memory that Ye et al. and therefore will be more suitable for generating robust models in memory-constrained environments as we discuss in the next section.

(**a**) LeNet Accuracy Trajectory on the Filter Pruning

(**b**) VGG16 Accuracy Trajectory on the Filter Pruning

(**c**) ResNet18 Accuracy Trajectory on the Filter Pruning

**Figure 2.** The original accuracy and the adversarial accuracy of AP and APD (ours) with respect to the epoch of the filter pruning procedure for (**a**) LeNet, (**b**) VGG16, and (**c**) ResNet18. The left of each row is the result in the compression rate of ×1.5, the middle of each row is the result in the compression rate of ×2, and the right side of each row is the result in compression rate of ×2.5. The blue line means the original accuracy and the red line indicates the adversarial accuracy. The solid line is the result of APD and the dashed line is the result of AP.

#### *4.4. Computational and Space Complexity*

To show the computational and memory efficiency of APD in comparison to other methods, here we provide a short analysis without big O notations. The most dominant part of the training procedure of CNNs in terms of computational complexity is the forward and backward operations. For a given network and input data, we denoted the amount of computation for a forward as *F* and the amount of computation for a backward as *B*. In addition, we supposed that the number of iterations for training given network is *IT* and the number of iterations for generating adversarial example as *IA*. Then, the computational complexity of most of the pruning methods such as FPGM is *IT* × (*F* + *B*). DD contains additional forward operations for generating the SoftMax output of the teacher network resulting in *IT* × (2*F* + *B*). A relatively large increase of computational complexity for APD and Ye et al. is inevitable since the adversarial training requires an iterative adversarial attack for every iteration. Considering this, the computational complexity of Ye et al. is *IT* × (*F* + *B* + *IA* × *F*), where APD requires *IT* × (2*F* + *B* + *IA* × *F*) since it contains both adversarial training and knowledge distillation.

On the other hand, the most dominant part of the space complexity of the training procedure is the number of learning parameters. To describe the space complexity, let us denote the number of weights of the given network as *P*. FPGM requires no additional parameter and therefore its complexity is *P*. The space complexity of DD and APD are 2*P* since they require a teacher and a student network to perform knowledge distillation. Ye et al. requires two additional parameters for ADMM and a large 3*P* space complexity in result. Compared to Ye et al., the analysis shows that APD requires far less memory with the cost of an additional forward step.

#### *4.5. Effectiveness of Knowledge Distillation on Other Attack Methods*

To test our method on the other adversarial attacks, we evaluated the adversarial accuracy of our PGD-based trained LeNet (MNIST) against Fast Gradient Sign Method (FGSM) attack [1] and Carlini–Wagner (CW) -<sup>2</sup> attack [4]. For FGSM attack, we set the attack radius to 0.3. For CW attack, we used -<sup>2</sup> bounded perturbation and set the maximum iterations to 1000. The baseline LeNet showed the original accuracy of 99.41% and the adversarial accuracy of 1.08% against FGSM and 0.48% against CW. The results are described in Table 6. The APD showed higher original accuracy and adversarial accuracy against both FGSM and CW -<sup>2</sup> attacks compared to AP in the entire compression rates. In particular, the improvement on the adversarial accuracy against CW -<sup>2</sup> attack is significant. Those results imply that our PGD-based approach is also effective on the other attack methods.



#### **5. Conclusions**

The adversarial robustness of the compressed CNNs is essential for deploying them to the real-world embedded systems. In this paper, we proposed a robust model compression framework for CNNs. Our framework used the knowledge distillation to improve the result of the existing adversarial pruning approach. In several experiments, our framework showed a significant improvement in the trade-off between the compression rate and the adversarial robustness on the two datasets, MNIST and CIFAR10. We found that the amount of improvement of our framework tends to decrease in the high compression rate. We expect that this phenomenon is due to the large gap in the architectural capacity between the teacher network and the student network. We hope that this phenomenon will be mitigated through future works.

**Author Contributions:** conceptualization, J.L. and S.L.; methodology, J.L. and S.L.; validation, J.L.; writing—original draft preparation, J.L.; writing—review and editing, S.L.; supervision, S.L. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2018R1D1A1B07051383), and by the MSIT (Ministry of Science and ICT), Korea, under the ITRC(Information Technology Research Center) support program (IITP-2020-0-01749) supervised by the IITP (Institute of Information & Communications Technology Planning & Evaluation).

**Data Availability Statement:** The data presented in this study are openly available: MNIST (http: //yann.lecun.com/exdb/mnist/) and CIFAR10 (https://www.cs.toronto.edu/~kriz/cifar.html).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Model Checking Resiliency and Sustainability of In-Vehicle Network for Real-Time Authenticity**

**Jin Hyun Kim 1,\*, Hyo Jin Jo 2,\* and Insup Lee <sup>3</sup>**


#### **Featured Application: MAuth-CAN is a new CAN authentication mechanism, and the proposed CAN model and verification techniques are useful to analyze timing properties of CAN applications.**

**Abstract:** The Controller Area Network (CAN) is the most common network system in automotive systems. However, the standardized design of a CAN protocol does not consider security issues, so it is vulnerable to various security attacks from internal and external electronic devices. Recently, in-vehicle network is often connected to external network systems, including the Internet, and can result in an unwarranted third-party application becoming an attack point. Message Authentication CAN (MAuth-CAN) is a new centralized authentication for CAN system, where two dual-CAN controllers are utilized to process message authentication. MAuth-CAN is designed to provide an authentication mechanism as well as provide resilience to a message flooding attack and sustainably protect against a bus-off attack. This paper presents formal techniques to guarantee critical timing properties of MAuth-CAN, based on model checking, which can be also used for safety certificates of vehicle components, such as ISO 26262. Using model checking, we prove sufficient conditions that MAuth-CAN is resilient and sustainable against message flooding and bus-off attacks and provide two formal models of MAuth-CAN in timed automata that are applicable for formal analysis of other applications running on CAN bus. In addition, we discuss that the results of model checking of those properties are consistent with the experiment results of MAuth-CAN implementation.

**Keywords:** controller area network bus; authentication; authenticity; resiliency; sustainability; formal verification; model checking; in-vehicle network

#### **1. Introduction**

Advanced digital control technology provides more convenience, safety, and predictability to automotive systems. Recently, many vehicles would not only make use of local sensors, but also cooperate with other vehicles and infrastructures, such as the Intelligent Transport System (ITS). For instance, Right-turn Collision Caution (RtCC) cooperating with infrastructures can alert drivers in a risky situation hidden when they would make right turn. ITS monitors the situation about oncoming vehicles and pedestrians around intersections or a corner with poor visibility from drivers where a vehicle would make a right turn. It cooperates with the vehicle via road-to-vehicle communication so the information on potential approaching risk is conveyed by vehicle-to-vehicle communication with audio and visual alerts to warn the driver, and when necessary, the driver is alerted about the approach risk. The infrastructure uses a dedicated ITS frequency of 760 MHz for road-to-vehicle and vehicle-to-vehicle communication to gather information that cannot be obtained by vehicle sensors. In addition, various features, such as Communication Radar Cruise Control, Red Right Caution, and Emergency Vehicle Notification using network

**Citation:** Kim, J.H.; Jo, H.J.; Lee, I. Model Checking Resiliency and Sustainability of In-Vehicle Network for Real-Time Authenticity. *Appl. Sci.* **2021**, *11* , 1068. https://doi.org/ 10.3390/app11031068

Academic Editor: Kyungtae Kang Received: 13 December 2020 Accepted: 6 January 2021 Published: 25 January 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

communication also helps drivers be more predictive against approaching risky situations so that their driving is safe and predictable.

However, the vehicle connecting to open networks can be vulnerable to security attacks. For instance, many studies such as [1–6] have shown that the adversary is able to easily access an in-vehicle network from the outside and control vehicles. Once an adversary compromises an ECU, it disguises itself as a normal node, breaches to other ECUs, and controls and disrupts normal driving function. In addition, DoS (Denial of Service) is also one of the most common attacks that exhausts data processing and communication resources.

To prevent the masquerade attack to a vehicle network, the most popular defense technologies are intrusion detection systems (IDS) and authentication systems. Most proposed IDS techniques are not, however, fast enough to protect the attack, i.e., the adversary can compromise the vehicle system before IDS detects the attack [7–13]. In order to address these issues, many authentication protocols have been studied, such as [14–20]. These works can be classified into two categories: authentication using group keys [14–17] or authentication using pairwise keys [18–20]. In case that CAN uses a group key for message authentication, the group key could be exploited if any node using the group key is compromised. In case of authentication using pairwise keys, CAN bus can be overflowed by authentication tags (e.g., message authentication code) if CAN bus uses a basic pairwise key-based authentication method where every destination node requires a unique authentication tag for verification of a CAN message. Thus, the work [18–20] adopts a centralized node-based authentication to deal with the overflow issue.

However, this centralized node-based authentication has two problems. First, the authentication by a centralized node can be delayed by DoS attack on the centralized node. Second, in case of the centralized authentication, the authenticator could miss a message if it is too slow to process every message. Thus, it should be guaranteed that the authentication is complete on time no matter how often the adversary sends an attack message.

To address the above problems, Jo et al. have proposed a new authentication protocol, named MAuth-CAN (Message Authentication-CAN), in [21]. MAuth-CAN uses an ECU node dedicated to authenticating each message over the CAN bus by using pairwise keys. For sharing the authentication result with other ECUs, the authenticator uses an authentication-fail error (AFR) message. The authentication fail report (i.e., AFR message) is transmitted and gives alerts to other nodes only when a message is authenticated. This minimizes the communication overload caused by a centralized authentication because the authentication fail report is transmitted only when a message cannot pass authentication. In addition, Jo et al. addresses Bus-off Attacks (BoAs) by introducing their centralized message authentication to dual-CAN controllers. Under the adversary's BoA to the authenticator, the AFR message from the authenticator can also be destroyed by the adversary, resulting in consecutive transmission errors. If the transmission error count of the authenticator steps over a threshold, it is enforced to leave CAN bus for a while and reset to recover the connection to CAN bus. Jo et al. adopts dual-CAN controllers for the authenticator to be more sustainable under BoA. Jo et al. [21] also showed that (1) MAuth-CAN is robust against the masquerade attack and BoA, (2) it requires approximately 46% less CAN bandwidth than a comparable protocol [19], and (3) it does not need to modify the current CAN controller to apply the CAN protocol.

However, they have not provided the proof of timing-related properties of MAuth-CAN that can be used for security proof and evidence for practical use of real applications. For instance, MAuth-CAN should prove that no adversary message is accepted by any node while authentication is in processing under DoS attack. It is related to a timeout for AFR, which delays message communication. Thus, it is necessary to check if such a timeout is bounded to check if the authentication delay meets the maximum acceptance communication delay.

In this paper, we show that the authentication of MAuth-CAN is resilient enough to prevent a masquerade attack for the given timing constraints and is sustainable under a DoS attack. In addition, we prove that the timeout for authentication can be bounded with respect to the message transmission time. In this paper, we apply formal methods of model checking to prove the timing properties of MAuth-CAN. We build formal models of CAN and MAuth-CAN using timed automata and perform model-checking to verify the critical timing properties of MAuth-CAN using UPPAAL SMC and UPPAAL MC. We present two formal models of CAN and MAuth-CAN. The first model abstracts MAuth-CAN by a producer-consumer model in terms of authenticator and attacker, so that it is proved that the authenticator in terms of a consumer addresses all attack messages from the attacker in terms of a producer. The second model details CAN in the level of MAC frame of the data link layer, so that the model of MAuth-CAN is shown to be valid in the data-link layer of CAN networking.

This paper presents sufficient conditions to ensure:


The above conditions are relevant to (1) the size of reception queue of authenticator's CAN controller, (2) the relation between authentication time and CAN bus transmission time, and (3) the number of CAN controllers of the authenticator.

This paper presents the following three contributions:


The rest of the paper is organized as follows: Section 2 discusses the related work. Section 3 presents the background theory of this work. Section 4 overviews MAuth-CAN, a centralized CAN authentication, two attack scenarios i.e., masquerade attack and BoA attack, and MAuth-CAN's countermeasure to those attacks. Section 5 shows formal proof of our proposed sufficient conditions for MAuth-CAN resiliency to a masquerade attack and sustainability to BoA attack, using symbolic and statistical model checking techniques. Section 6 presents more results from the implementation of MAuth-CAN. In Section 7, we conclude this paper with the potential future work.

#### **2. Related Work**

In 2010, Koscher et al. were the first to demonstrate attacks on in-vehicle network using a real vehicle [1]. They introduced the CARSHARK tool, which makes it easy for an adversary to analyze and inject attack packets on in-vehicle network, i.e., CAN bus. After the first vehicle attack, many studies included new attack surfaces on an in-vehicle network [2–6]. To deal with these cyber-attacks on in-vehicle network, intrusion detection systems [7–13] and message authentication protocols [5,14–21] were studied.

In the work of [7–10], the transmission frequencies or sequences of CAN packets were used to detect the CAN traffic abnormality caused by in-vehicle network attacks. Recently, deep neural network (DNN) model-based intrusion detection systems that take transmission frequencies or sequences of CAN packets as input values have been proposed in [11,12]. However, these studies [7–12] cannot detect masquerade attacks by a compromised ECU because the compromised node can mimic the transmission frequencies or sequences of CAN packets to bypass intrusion detection algorithms.

To handle the masquerade attacks, Cho et al. proposed an ECU's clock-based intrusion detection system [13]. In this study, a clock skew for each ECU is profiled as a hardware fingerprint, which is unique for every ECU, and this inimitable value is used to identify a masquerading ECU. However, this study cannot be used to deal with masquerade attacks using aperiodic CAN

messages generated from aperiodic vehicle operations, such as auto-parking, lane keeping aid (LKA), and adaptive cruise control (ACC) functions. Furthermore, this clock-based intrusion detection can be defeated by the clock emulation attack proposed in [22].

To address the limitations of existing intrusion detection systems, message authentication protocols for in-vehicle network have been designed. In general, message authentication protocols can be divided into two categories: a group key-based authentication [14–17] and a central node-based authentication [18–21]. In the group key-based authentication studies [14–17], one group key shared by all ECUs is used to generate authentication tags such as message authentication code. However, these studies cannot also handle masquerade attacks because one group key could be accessed by a compromised node. In light of this, centralized node-based authentication studies have been presented in [18–21] for handling masquerade attacks by compromised ECUs. Since the centralized node-based authentication does not share one group key with all ECUs, a compromised ECU cannot access the authentication keys stored in other ECUs. However, the methods [18,19] cannot be applied into legacy vehicles because the CAN-controller must be modified to include new functions that do not follow the CAN-standard or incurs network overhead that exceeds the maximum capacity of the CAN bus. Furthermore, the protocol [20] also has limitation that several bytes of a CAN message is not included in the authentication value generation process.

To handle these issues of authentication protocols, Jo et al. presented an authentication report-based message authentication [21]. This protocol does not incur network overhead nor require CAN controller modification, but there is a message authentication delay caused by an authentication report message. Even though the work of [21] evaluated the authentication delay by using CAN development boards, there is no formal analysis about the delay which could affect real-time operations of vehicles. Thus, this paper puts the authentication delay of [21] into formal analysis using UPPAAL SMC and UPPAAL MC.

In addition, we did the several Arduino-based authentication tests, which are related to what-if analysis and robustness checking defined in [23], by measuring the authentication delay of [21] in the worst case scenarios to show that the authentication delay is bounded within a certain amount of time even when there are DoS attacks such as message flooding and bus-off attacks on CAN.

#### **3. Preliminaries**

In this section, we give the overall of our approach and overview our formal techniques, model checking, UPPAAL and CAN communication, prior to MAuth-CAN in the following section.

#### *3.1. Our Approach*

The CAN authentication in MAuth-CAN meets two goals: (1) No receiver can open any message that does not go through a centralized authentication of MAuth-CAN, (2) The CAN controller for message authentication is never enforced to leave CAN bus by consecutive and numerous transmission errors by intention.

In this paper, we show why MAuth-CAN never fails to meet the above goals. To simplify the above goals, we present sufficient conditions in theorems, which should be satisfied to meet the goals (Section 4.4, and then prove them by model checking (Section 5)). We use a high-level model of MAuth-CAN, where the reaction of the authenticator to the attack message is highlighted (Section 5.1). Then, using model checking, we prove that the authenticator of MAuth-CAN passes no attack message without verification even under even consecutive attacks if the sufficient conditions in Theorem 1 consisting of Lemma 1 and Lemma 2 are satisfied. We present a low-level formal model of MAuth-CAN in the MAC level of the data link layer of CAN, which is detailed enough to be able to reflect actual behaviors of CAN. This model ensures our verification is practical enough to provide valid proofs of security of MAuth-CAN (Section 5.2). Then, we prove Theorem 2 by proving Lemma 3, which is the essential property of MAuth-CAN assumed by Theorem 2. Finally, we show that our verification results are consistent with the actual implementation of MAuth-CAN (Section 6).

In the following subsection, we present our model method technique, model checking using UPPAAL.

#### *3.2. Model Checking*

Model checking is a rigorous verification method that presents a mathematical proof for a given property of the system. It accepts a system model and properties that the system model should satisfies. During verification, model checking explores all states of the system by taking every symbolic computational step and exhaustively check if every state satisfies given properties. Since model checking explores thoroughly all states of the system, it requires numerous time and memories. It is used to obtain guarantee of given properties of safety critical systems by mathematical proving techniques.

In this paper, we apply UPPAAL, a model checker, to prove MAuth-CAN's properties. UPPAAL tool suite includes various analysis techniques such as symbolic model checking, statistical model checking, and simulation. Symbolic model checking of UPPAAL accepts timed automata (TA) [24] as modeling language and use CTL (Computational Tree Logic) for property specification. CTL in UPPAAL comprises path formulas and state formulas. A path formula consists of branch quantifiers and path quantifiers. *A* and *E*, branch quantifiers, denotes "all paths" and "any path", respectively. and ♦, path quantifiers, represent "all states" and "exist a state", respectively.

Let φ a state formula. A path formula along with a state formula is expressed by the grammar:

$$\Phi ::= \Phi \mid A \Box \Phi \mid E \Box \Phi \mid A \Diamond \Phi \mid E \Diamond \Phi \mid \Phi\_1 \to \Phi\_2$$

Using such a formula, reachability, safety and liveness properties can be formulated in UPPAAL. Reachability properties are expressed by the path formula *E*♦φ, meaning that a state satisfying φ is reachable.

Safety properties are formulated by the path formula *A*-. For example, *A*φ requires that φ should be true in all reachable states. Meanwhile, *E*φ denotes that there exists a maximal path such that φ is always true. A maximal path is a path that is either infinite or where the last state has no outgoing transitions [25]. Liveness properties are formulated by the path formula *A*♦φ, which means that there exists a state satisfying φ in all the branches, i.e., φ is eventually satisfied. One of useful formulas is the leads to or response property, which are written *A*-(φ→*A*♦ψ). That means that whenever φ happens, ψ should hold eventually [25]. For instance, whenever a message is sent, that should always be acknowledged.

UPPAAL SMC accepts a network of stochastic timed automata (NSTA). A model of network of timed automata in UPPAAL is redefined by a network of stochastic timed automata where the non-determinism of behavior in a timed automata model is refined by a probability distribution, so that the property for a given model is characterized by a probability that an event happens or a property holds.

The specification of UPPAAL SMC is based on Metric Interval Temporal Logic [26]. For an NSTA *M*, P*<sup>M</sup>* (]φ) denotes the probability that a random run of *M* satisfies φ. The problem of checking P*<sup>M</sup>* (φ) ≥ *p* (*p* ∈ [0, 1]) is undecidable. For this reason, for the sub-logic of cost-bounded reachability problem P*M*(♦(*x*≤*C*) *AP*) ≥ *p*, where *x* is a clock, *C* is a time bound, and *AP* is a conjunction of predicates over the state of a NSTA, UPPAAL SMC approximates the answer using simulation-based algorithms [27]. In UPPAAL SMC, the following three types of questions can be answered:


P*M*(♦(*x*≤*C*)*AP*) is expressed by "P[⇐ *C*](<> *AP*)" in UPPAAL. This formula omits *x* from the original formula assuming that the global clock is used implicitly by formula. Besides, the following two forms of queries to simulate a given model:


where bound is a time bound on the simulation, Ek is an expression that would be monitored and visualized.

In this paper, we use both UPPAAL and UPPAAL MC for proving properties of MAuth-CAN and simulating our model of MAuth-CAN. In Section 5, we present MAuth-CAN model of TA and various properties specification in CTL and verification results from UPPAAL.

#### *3.3. CAN (Controller Area Network)*

A Controller Area Network (CAN) is a de-facto standard for an in-vehicle network. Basically, once a node using CAN releases a message onto CAN bus, CAN broadcasts the message to all nodes, and the message is selectively picked up by an ECU that is one of message's destinations. Table 1 shows the structure of the CAN packet frame.

**Table 1.** CAN packet frame (Unit: bits).


Table 2 shows the individual frames of a CAN packet. Each node is given its own CAN ID, which plays a role as priority for CAN bus. Two or more nodes release messages into CAN bus at the same time, then one of them with the higher priority can transmit the message. In CAN bus, 0 (dominant bit) has a higher priority over 1 (recessive bit). That is, CAN controller permits 0 to flow over CAN bus rather than 1 when both are released at time same time. CAN causes various errors, such as bit, stuff, CRC, and ACK errors. Once a node on CAN bus encounters one of the errors, rest of nodes are informed the error simultaneously. Each node updates one of error counters, such as Receive Error Counter (REC) and Transmit Error Counter (TEC), according to error types error mode depending on error counter. For instance, a node transits into the passive error from the active error state when REC or TEC is over 126 (≥127). A node under the passive error state goes to bus-off state when TEC is over 255 (≥256), but the node is not driven to bus-off state by REC. Once a node is at bus-off state, it is enforced to leave the CAN bus for a specific time. TEC has different increasing and decreasing rates. Every time a transmission error happens, TEC increases by 8. Meanwhile, it decreases by 1 every time a transmission is successful.




**Table 2.** *Cont.*

#### **4. MAuth-CAN**

This section overviews MAuth-CAN, a new CAN authentication technology, and their properties for protection of masquerade attack and BoA. In addition, we formulate properties for model checking of MAuth-CAN. Prior to description of MAuth-CAN protocol and models, Table 2 defines symbols and variables for formal descriptions.

#### *4.1. System and Adversary Assumptions*

In this subsection, we provide the assumptions for CAN and adversary (attacker), in particular, their capability for defense and attack.

#### 4.1.1. System Assumptions

First, the system dedicates to the authenticator an ECU with two CAN controllers for CAN message authentication. The dedicated ECU is assumed to be assigned to the highest priority CANID, which is open to anyone. Second, we assume that it is possible to compute the maximum acceptable communication delay for a given application running and communicating over CAN. Third, we assume that the ECU, i.e., the authenticator, for authentication is very hard for the attacker to compromise so as to drop the assumption that CAN is a victim of a single point failure (SPF) where all points lose a specific security once a point is compromised. This assumption can be achieved by applying lightweight tamper-resistance hardware such as SMART [28] and TrustLite [29] into the authenticator.

#### 4.1.2. Adversary Assumptions

An adversary is subject to the following assumptions: First, any adversary reaches a node responsible for driving controls and can cause bad driving consequences. Second, two or more adversary nodes cannot perform DDoS (Distributed DoS), i.e., attacker cannot compromise more than one ECU node. Third, the information of CAN messages, such as source address, data, etc., transmitted on CAN bus can be fabricated, forged by adversary node. Fourth, the highest priority ID can be exploited by an adversary. Fifth, each ECU has a different CAN controller and the different number of the message receiving queues from the others. Sixth, each ECU is equipped with a single message buffer each for transmission and reception.

#### *4.2. Attack Scenarios*

#### 4.2.1. Masquerade Attack

An adversary fabricates CAN messages with a normal CANID so that vehicle driving is illegally controlled by adversary's control messages. For example, the compromised ECU can transmit a CAN message using the CANID of an ECU related to the engine to control the vehicle's speed. According to [5], the CANID of 0x43F was transmitted by a compromised node to actuate the vehicle's engine.

#### 4.2.2. Denial of Service Leading to Bus-Off

Figure 1 describes a scenario of an adversary's Denial of Service attack using BO. The adversary performs DoS attack with consecutive attack messages, in particular, while the authenticator needs to broadcast an authentication-fail report. When both the adversary and the authenticator attempt to send messages with the same identifier simultaneously, the AFR and the attack message collide with each other. As a result, the transmission error occurs and increases TEC on both sender and receiver. If the TEC of the authenticator goes behind the threshold of Passive Error mode, it has less chances to transmit messages than the attacker. This situation can continue by the crafty attacker until the authenticator becomes off CAN bus.

**Figure 1.** Bus-off attack scenarios from a single-point adversary (IFS: Interframe space, Suspend: Suspend transmission, RX mode: Reception mode, Active: Error active state, Passive: Error passive state, C#1: Controller #1, C#2: Controller #2).

*4.3. Countermeasures of MAuth-CAN*

To protect from the above attacks, MAuth-CAN performs the authentication using dual-CAN controllers as shown in Figure 2.

**Figure 2.** New authenticator model with dual CAN controllers.

As the authenticator uses two CAN controllers, the reception and transmission queues are doubled. The controller has its own transmission error counter (TEC) and receive error counter (REC), thus dual-CAN controllers have two TECs and RECs for authentication. In particular, TEC is the decisive variable that determines to expel a CAN controller from CAN bus.

#### 4.3.1. Countermeasure to Masquerade Attack

To avoid masquerade attacks, MAuth-CAN performs the authentication for every single message via CAN bus, as shown in Figure 3. When an ECU transmits a message upon CAN bus, every CAN controller takes the message into its reception queue but delays in reading it until the authentication for the message is done. A CAN controller keeps a new message in its reception queue for T B time units, as shown in Figure 3. The controller reads a new message when the TB expires (Pass scenario in Figure 3). If a message does not pass authentication, then a CAN authenticator creates and broadcasts an error report i.e., an authentication-fail report (AFR) and ECUs discard the message (Fail scenario in Figure 3). MAuth-CAN uses the duration of the length of 4 × Ttx for TB under the assumption that the transmission time is always greater than the authentication time. In this paper, we present the results of model checking for proving that the condition and the assumptions of MAuth-CAN authenticator are sufficient to protect masquerade attack to CAN system. The details regarding the AFR message is given in [21].

**Figure 3.** Basics of MAuth-CAN.

4.3.2. Countermeasure to DoS and Bus-Off Attacks

Figure 4 shows a scenario that MAuth-CAN performs CAN message authentication under BoA, and consequently sends all AFR messages to ECUs when a message cannot pass the authentication.


**Figure 4.** MAuth-CAN resistant against flooding.

#### **AFR Flooding**

Every time an unauthenticated message comes in, the authenticator instantiates and broadcasts an AFR. Ideally, if the authenticator always dominates CAN bus over any nodes including the attacker, every ECU under masquerade attack must receive the AFR message within 4 × Ttx time units according to Figure 4. In order for the attacker not to be able to infer anything from AFR messages or reuse the previous AFR messages, the authenticator uses the reversed hash chain. An AFR message consists of two packets. Thus, the AFR message for the first adversary message can reach all nodes within 3 × Ttx time unit, but the AFR message for the second adversary message can be delayed by the first AFR message. For this reason, the waiting time of ECU for authentication needs to be 4 × **Ttx** time units. **BO Avoidance**

Both authenticator and attacker attempt to dominate CAN bus at the same time if the highest priority CANID is open. If they send messages simultaneously with the same ID, the transmission error occurs and increases TEC of message senders i.e., the authenticator and the attacker, here. If either of authenticator or attacker's TECs steps over a threshold of Active Error mode, it transits to Passive Error mode. When TEC goes over the limit of Passive Error mode, the ECU in Passive Error mode is enforced to leave CAN bus for a while.

To avoid this situation, the authenticator is equipped with two CAN controllers using two TECs of each CAN controllers. Consequently, both CAN controllers of the authentication cannot be enforced to enter into Passive Error mode at the same time.

MAuth-CAN is resilient to masquerade attacks if the authenticator leaves no missing message to verify at all. It is also sustainable under BoA because the attacker is disabled to send adversary messages faster than authenticator using two TECs in dual-CAN controllers.

#### *4.4. Sufficient Conditions for MAuth-CAN Resiliency and Sustainability*

CAN is resilient to masquerade attack if the authentication makes it to investigate every single message. Also, it is sustainable if the authentication is never disabled by BoA. MAuth-CAN achieves the above two goals by introducing a new authenticator equipped with two CAN controllers. In this paper, we show that MAuth-CAN achieves the above two goals with the following properties and prove them using model checking.

**Theorem 1.** *If MAuth-CAN authenticator uses a reception queue of size 2 for incoming new messages and the authentication time is always less than the message transmission time, it never fails to transmit AFR messages and the duration for ECUs to wait for AFR message needs no longer than 4* × *Ttx*.

Theorem 1 emphasizes on the size of the reception queue for the CAN controller of the authenticator and the relation between the authentication time and the transmission time. The size of the CAN controller's reception queue is relevant to the resiliency of MAuth-CAN authenticator. The relation between the authentication time and the transmission time is relevant to the waiting time of ECUs for AFR messages.

#### **Theorem 2.** *MAuth-CAN authenticator is sustainable under BoA if it uses two CAN controllers*.

The sustainability of MAuth-CAN in Theorem 2 means that MAuth-CAN is never enforced to be off from CAN bus. In order to prove Theorem 2, we focus on TEC of authenticator's CAN controllers because TEC of the CAN controller goes over the threshold of Passive Error mode, then the CAN controller is enforced to leave CAN bus for a while. Thus, we will show that TEC of authenticator's CAN controllers never goes over the threshold of Passive Error mode even if that of attacker's CAN controller goes over the threshold. In next section, we will prove the above two theorems using model checking techniques.

#### **5. Formal Analysis of MAuth-CAN**

In this section, we present two formal models of CAN authentication in TA: An abstract CAN networking model and a detailed ECU model. The first model, the CAN networking model, captures the interlocking between three components: the authenticator, CAN bus, and ECUs. It focuses on verification of Theorem 1. The second model, the ECU model, details the behaviors of ECUs and bus at a bit-wise level so that the analysis can be done at a lower level. It focuses on verification of Theorem 2.

#### *5.1. Model Checking Analysis of Theorem 1*

To avoid the complexity of formal analysis, we abstract interaction between CAN components, as shown in Figure 5.

**Figure 5.** CAN networking model. (**a**) This figure shows the state transitions of the authenticator in MAuth-CAN. (**b**) It shows the state transitions of an ECU in data transmission. (**c**) It shows the state transitions of the CAN bus for data transmission.

The model of CAN interaction comprises authenticator, attacker, and CAN bus. The attacker model has the same behavior as normal ECUs, but the authenticator in our model responds to the message from the attack by broadcasting AFR messages. We do not include the behavior of the CAN controller of CAN message receiver nodes in our model since the authenticator model has the same behavior as CAN message receiver and we focus on the resiliency of MAuth-CAN's authentication that handles every attack message.

The authenticator model in Figure 5a waits for any message through CAN bus. When the authenticator reads a message (RxMsg) from CAN bus, the authenticator in Figure 5a transits into the location Authenticate for processing authentication. If the message passes the authentication, it returns to Idle state. Otherwise, it joins SendRep state. Then, it sends the AFR message for the unauthenticated message when CAN bus is available (CANBUS = FREE). The authenticator keeps any message in its reception queue (EnQ) when it is in authentication. If the queue is empty, the authenticator returns to the initial location Idle. Otherwise, it returns to Authenticate and performs authentication for another incoming message again.

The CAN controller model of an attacker in Figure 5b is simpler than the authenticator. If the CAN controller has a message to send and CAN bus is available, it just sends it through CAN bus. Notice that it returns to the initial location when the transmission of the message is acknowledged by CAN bus through the event RxMsg.

The CAN bus model in Figure 5c controls the permission for a CAN controller to access CAN bus. Initially, it allows any controller to use CAN bus by setting CAN-BUS:=FREE. If the CAN bus receives a message via TxMsg, it locks the key by setting CANBUS:=OCCUPIED, prohibits any node from using CAN bus, moves to Transmit location, and notifies the transmitting of a message. Finally, the CAN bus returns to the initial location Idle with unlocking the key with setting CANBUS:=FREE.

Based on the CAN networking model in Figure 5, Figure 6 captures CAN behavior models in TA. It also comprises three models: Authenticator, ECU and CAN bus. Four TA processes are instantiated for simulation and verification: Two authenticator processes from the authenticator model, one CAN attacker process from the ECU model, and one CAN bus process from the CAN bus.

(**a**) CAN authentication using two buffers of the incoming queue in TA.

**Figure 6.** Simulation of CAN networking in TA. (**a**) This figure shows two UPPAAL processes of message authenticator that individually process authentication of incoming message. (**b**) This figure shows an UPPAAL process of an ECU that continuously sends messages, simulating message flooding attack. (**c**) This figure shows an UPPAAL process of a CAN bus that simulates the message transmission in the synchronization with the sender ECU and the receiver ECU.

As shown in Figure 6a, two authenticator processes are instantiated from the authenticator model to capture message queuing behavior using the reception queue of size 2. It particularly highlights the concurrent behavior of the CAN controller's authentication, reception, and message transmission using the reception queue of size 2. When a new message arrives, authenticator's CAN controller checks the message and sends AFR messages for unauthenticated messages. While the authenticator is sending AFR, it can simultaneously receive another new message. It is because the reception queue and sending queue of a CAN controller are separate. However, only one of them can process authentication at the same time. The authenticator processes, ECUAuth\_Q1 and ECUAuth\_Q2, in Figure 6a have an invariant over Authenticate location, which limits the authentication time to a specific time bound AUTH\_TIME. The authenticator process leaves Authenticate location after AUTH\_TIME expires and transits to TxRep location so as to send one of AFR messages. In this interaction model, canstat represents the status of CAN bus. The authenticator process can send the AFR message through CAN bus when

no node occupies CAN bus, then canstat value of CANBus1 is set to true (1). When TA authenticator enters RepeatAFRMsg location, it broadcasts two consecutive packets for one AFR message.

The ECU process in ECUTx1\_Q1 of Figure 6b may send any attack message (txMsg[canid][attkid]) at any time if CAN bus is available. If the reception of the attack message is acknowledged by the authenticator i.e., rxMsg[canid][attkid] is received, it may send another message.

The CAN bus in CANBus1 of Figure 6c manages the permission for use of CAN bus using canstat. If the CAN bus process receives a message from an ECU and the authenticator, it sets canstat to false (0). Then, any CAN controller cannot occupy CAN bus. The transmission of messages is captured with the clock x and the invariant TRX\_TIME over Transmit location. The CAN bus process stays over Transmit location for TRX\_TIME time units, and then it leaves Transmit location with synchronizing the channel rxMsg and setting canstat to true (1). Particularly, The CAN bus model is designed to count the number of attack messages using the function checkAttk(). The number is denoted by AttMsgCnt. AttMsgCnt keeps increasing, meaning that the authenticator fails to check the attack message. If AttMsgCnt keeps below a specific number, particularly the reception queue size of the authenticator, it means that the authenticator succeeds in authenticating every attack message.

MAuth-CAN authenticator must not miss any message without verification, meaning that no ECU should not read unauthenticated message. All CAN controllers on ECUs temporarily store any incoming message in the reception queue during authentication. They postpone reading it until a predefined authentication time ends. However, when the AFR message arrives within the predefined authentication time, the CAN controller regards that the message in the reception queue fails the authentication and discards it. For the reasons, it is crucial to characterize the AFR waiting time TB i.e., the duration that an ECU waits for AFR message. Also, the CAN authenticator is capable of verifying consecutive adversary messages and transmit AFR messages within a predefined authentication time so as to protect every ECU from adversary messages. In terms of the authenticator, we prove the following lemma in order to characterize the CAN controller of the authenticator that can protect adversary messages in any forms:

**Lemma 1.** *If the CAN controller of the authenticator is given the reception queue of size 2 and the transmission time is less than the authentication time, it can always verify every new message and every ECU does not miss AFR*.

In order to prove Lamma 1, we model-check the CAN controller model of the authenticator and checks the number of delayed AFR messages in the transmission queue. If the variable AttkMsgCnt is not larger than the size of the reception queue, we can say that the authenticator has no remaining AFR to send. The following two CTL properties are checked by UPPAAL MC and the verification results are also shown in Table 3:

#### **CTL-Property-1: A[] not deadlock** (1)

#### **CTL-Property-2: A[] AttkMsgCnt** *≤* **IQSizeAUTH** (2)


**Table 3.** Setting of the model checking for Lemma 1 and model-checking results.

CTL-Property-1 specifies that the system is put into no deadlock where no progress is made. CTL-Property-2 states that AttkMsgCnt is not larger than the maximum reception queue size of authenticator's CAN controller.

To prove Lemma 1, we have six different configurations in Table 3, where the maximum reception queue size of the authenticator, the authentication time, and transmission time are varied. The maximum reception queue size is either of one or two. The transmission time and the authentication time are also varied in such a way that Tauth~Ttx where ~= {<, =, >}.

In Table 3, Tauth and Ttx denote the authentication processing time and the transmission time, respectively. The results show that CAN authentication needs no more than two queues if the authentication time is less than the transmission time, so that no message is missed without verification.

To validate our models, we simulate the model using statistical model checking technique and the following query:

#### **Sim-Property-1: simulate [***≤***100;1] CANBus1.srvCANID** (3)

This query states which CANID preempts CAN bus over time.

Figure 7 shows a simulation of CAN authentication with different number of CAN controller. The *x*-axis represents the time and the *y*-axis represents the identifier of a CAN controller which makes it to transmit a message. Thus, Figure 7 shows which CAN controller makes it to transmit messages over time. The attacker's CANID is 2 and the authenticator's is 1.

(**a**) Simulation of CAN authentication with a single buffer (size 1) of the incoming queue of CAN.

(**b**) Simulation of CAN authentication with two buffers (size 2) of the incoming queue of CAN.

**Figure 7.** Simulation of CAN authentication nodes. (**a**) This figure shows that the attack messages are overwhelming CAN bus by message flooding attack, and blocking all AFR messages from the authenticator. (**b**) This figure shows that two consecutive AFR messages for one attack message are transmitted on CAN bus without failure and the attack message cannot dominate CAN Bus.

> Figure 7a shows the case where the authenticator's CAN controller uses a single buffer of a reception queue so it cannot handle no more than one message. The first two messages are adversary messages sent by the attacker. The third and fourth messages are the AFR messages sent by the authenticator after addressing the first adversary message. Note that authenticator's CAN controller succeeds in sending the AFR messages for the first adversary message, but not the rest of the adversary messages.

> Meanwhile, the plot in Figure 7b shows different behavior of CAN authentication when authenticator's CAN controller 2 size of the reception queue. In Figure 7b, the first two transmissions are made by the attacker. The second adversary message transmission

is possible while the authenticator is checking the first adversary message. However, the 4 transmissions following the first two adversary messages are made for AFR messages by the authenticator. That is consistent with Figure 4: Two attack messages can be consecutively transmitted over CAN bus, but AFR messages follow those attack messages. In following, all AFR messages are successfully sent following every attack message. No AFR message is delayed by adversary message, then no adversary message is adopted by ECUs due to AFR messages.

In order to compute the minimum TB, we present Lemma 2:

**Lemma 2.** *If the CAN authenticator makes it to address all consecutive attack messages, TB is not necessarily longer than 4* × *Ttx*.

In our model, for a given authentication time, denoted by Tauth, and transmission time, denoted by Ttx, we can measure the maximum communication delay, using the clocks ECUAuth\_Q1.x and ECUAuth\_Q2.x on the locations ECUAuth\_Q1.TxAck and ECUAuth\_Q2.TxAck. For given Tauth = 1 and Ttx = 2, we check the worst-case time for the AFR message to arrive all ECUs. We use the following queries:

#### **CTL-Property-3: sup{ECUAuth\_Q1.TxAck}: ECUAuth\_Q1.x** (4)

#### **CTL-Property-4: sup{ECUAuth\_Q2.TxAck}: ECUAuth\_Q2.x** (5)

"sup{expr}: list" in UPPAAL MC returns the maximum value of variables in "list". That is, the expression in list is evaluated only on the states that satisfy expr (a state predicate) that acts like an observation.

Model checking shows that the worst-case response time of the AFR is always 8 (4 × Ttx). That is consistent with the illustration in Figure 4, thus we can conclude that 4 × Ttx is the minimum TB.

In the results of model-checking for CTL-Property-1, 2, 3, and 4, we prove Lemma 1 and Lemma 2. Consequently, Theorem 1 is proved by the proofs of Lemma 1 and Lemma 2.

In this section, we show that MAuth-CAN is resilient to masquerade attack using consecutive adversary messages if the authenticator reads incoming message using 2 size of reception queue and the authentication time is less than the message transmission time. In particular, it is shown that the consecutive two messages of AFR sent by the authenticator can prevent the flooding of adversary messages by preempting CAN bus. However, it is true only if the AFR messages is successfully transmitted to other nodes.

In next section, we will show that MAuth-CAN is sustainable to BoA even if CAN priority is not secure and attacker can utilize the highest priority of CAN.

#### *5.2. Model Checking Analysis of Theorem 2*

Recall the scenario that BoA enforces the CAN controller of the authenticator to leave CAN bus for a while. When the authenticator tries to transmit AFR messages, the attacker causes transmission error. The attacker with the same priority of CAN bus begins the attack message transmission at the same time when the authenticator begins message transmission. Then, two messages conflicts, resulting in transmission error. The repeated transmission errors accumulate up to a specific count, then CAN system gets rid of the attacker and the authenticator from CAN bus for a while. The CAN authentication should be designed to sustainable against this BoA.

In order to capture such a complicated situation, we present more concrete and detailed model of the CAN controller and bus in TA. Our CAN controller model in TA captures a detailed behavior of the CAN controller based on the CAN protocol in Figure 8. We capture CAN controller's behavior in a bit-wise level as if a simple protocol is captured by a TA as shown in Figure 9. In Figure 9, a frame consists of a specific number of bits and TA captures the behavior of such a frame with the same series of time units. Here, we do not consider the semantics of bits and focus on a bit-wise timing behavior of the protocol.

**Figure 8.** CAN-Frame in base format in bit levels [30].

**Figure 9.** Modeling a dummy protocol into TA. (**a**) This figure gives an example of a simple protocol in packet frames. (**b**) This figure captures the simple protocol into a TA model in a bit-wise level.

If a protocol evolves from one frame to another frame, TA captures the frames transits from one location and to another location. Basically, a frame is captured by a location in our TA model, where our TA model stays for the same time units as the number of bits of a frame. For example, the Start frame using 1 bit in Figure 9 is captured by the Start location where TA stays for the same number of time units as 1 bit. A specific event occurring on a frame can be captured by an event causing a transition leaving off the location representing the frame. For example, if a Data frame in Figure 9 encounters an error and needs re-transmission, then TA captures it by a transition returning from Data location to Start location.

Figures 10 and 11 shows TA models of CAN bus and controller in the MAC level of the data link layer. The CAN controller has three modes: Receiving mode, transmission mode, and error handling mode. The receiving mode consists of receiving (Rxing) and error handing locations (RxErrRep, RxErrStuffing, RxErrDelimite). The transmission mode is composed of multiple transmissions of different frames, as shown in Figure 11. When a CAN controller needs data transmission, the CAN controller model at the location Rxing in Figure 10 checks if CAN bus is available by checking the condition variable canstat. The SOF frame of Figure 8 is modeled by the invariant x ≤ SOF over the location StartTrans and the guard x==SOF on the transition leaving StartTrans in Figure 11. Note that the Arbitration Field frame needs an interaction of CAN controller with the CAN bus for CAN bus arbitration and such a scheduling responsibility is placed upon the CAN bus, so the Arbitration Field frame is modeled on the location CANArbitration in CAN bus model of Figure 10. The last 1 bit of Arbitration frame field and the first 2 bits of Control field are abstracted together by the location DestControl. When more than one CAN controller attempt to make any frame transmissions simultaneously, it may lead to a transmission error status of CAN controller and bus. The transmission error is captured by the transition leaving the location Txing of Figure 11 having no guard. The transition may be taken non-deterministically to leave the location Txing, and that implies that our CAN controller model of TA can go to a transmission error (handling) status at any time. When a transmission error occurs, the CAN controller is put into one of Active Error model, Passive Error mode, or Reset. When a transmission error happens, the CAN controller goes to at Active Error mode and an Active Error frame will be transmitted on the bus if TEC (Transmission Error Counter) is lesser than 128. If TEC is greater than 127 and lesser than

255, then the CAN controller is led to Passive Error mode and a Passive Error frame will be transmitted on the bus.

**Figure 10.** CAN bus model in TA.

**Figure 11.** A CAN controller model in TA.

The CAN controller in Passive Error mode is given a penalty in such a way that it is more delayed to make transmission than the CAN controller in Active Error or Normal modes. The situation is captured by our CAN controller model of TA where the CAN controller in Passive Error mode should stay over SuspTrans location for 8 time-units. When TEC of a CAN controller is greater than 255, then the controller enters Bus Off state, where no frames cannot be transmitted by the controller [30]. We capture the BO situation with Reset location in Figure 11, where the CAN controller stays for a while without being able to send any message.

Now, we present the formal verification results of model checking for MAuth-CAN under BoA. In order to prove Theorem 1 that the CAN controller in charge of the authentication is sustainable to BoA, we need to verify if our dual-CAN controllers can never be put into Passive Error mode when the attacker crafts continually to cause transmission errors. We introduce to Lemma 3 as follows:

**Lemma 3.** *Dual-CAN controllers of MAuth-CAN authentication is never put into Passive Error mode together at the same time when the attacker is in Passive Error mode*.

We prove Lemma 3 by model checking as follows: In order to reduce the state space of our models, the TA authenticator model and the TA at-tacker model in Figures 12 and 13 are mutated from the CAN controller model of Figure 11 so that they terminate analysis when one of them goes into Passive Error mode. That is, when either of the authenticator controller or the attack controller goes to Passive Error mode first, then the analysis is over. We verify that both dual-CAN controllers for authentication never go to Passive Error mode at the same time. In this way, our model checking using the mutated CAN controller models can be less suffering state-explosion issue. We use the following queries to prove Lemma 3:

$$\text{CTL-Property-5: A[] not dealblock}\tag{6}$$

#### **CTL-Property-6: A[] CANContAttk3.errMod==ERR\_PAS imply (CANContAuth1.errMod != ERR\_PAS or CANContAuth2.errMod != ERR\_PAS)** (7)

**CTL-Property-7: A[] (CANContAuth1.errMod! = ERR\_PAS or CANContAuth2.errMod! = ERR\_PAS)** (8)

**Figure 12.** An ECU controller model of the authenticator in TA.

CTL-Property-5 specifies that the system model should never be in deadlock status in which every process stops running. We use it to check if our TA model is valid to check using model checking. CTL-Property-6 specifies that the error mode (errMod) of both authenticators (CANContAuth1 and CANContAuth2) would never be in Passive Error mode (ERR\_PAS) together at the same time when the error mode (errMod) of the attacker (CANContAtt3) happens to be in Passive Error mode (ERR\_PAS). Similarly, CTL-Property-7 is used to check if they can fall into Passive Error mode.

**Figure 13.** An ECU controller model of the attacker in TA.

Figure 14 shows that the properties above are met by our model, implying that Lemma 3 is proved by the model checking of the CTL properties. By proving Lemma 3, we conclude that Theorem 1 is proved and that our authentication using dual-CAN controllers is resilient to BoA even when the attacker can exploit the highest priority of CAN controller.

**Figure 14.** Model checking results for MAuth-CAN's resiliency to BoA.

Our model checking environment is as follows:


#### **6. Implementation and Experiments**

In this section, the implementation and experimental results of MAuth-CAN are provided to check whether Theorem 1 and 2 proved in the formal analysis are applicable to the CAN testbed considering real CAN traffic with message authentication. In the experiment of MAuth-CAN, we adopt the BLAKE2S algorithm with keyed mode for implementation of message authentication code, which is used to generate authentication tags for CAN messages and report messages. BLAKE2S is a cryptographic hash function which is faster than keccak (SHA3) in software implementations. The security proof of BLAKE2S with keyed mode is referred to [31]. We tested the implemented source codes on the Raspberry Pi 3 Model B and Arduino Zero that are assumed to be the authenticator and the normal ECUs, respectively.

#### *6.1. Message Authentication Time*

When an ECU transmits a CAN message, it always generates a message authentication tag (i.e., a MAC value). The CAN message then is verified by the authenticator and a report message (an AFR message) is generated if there is a verification failure. The normal ECUs verify an AFR message to see if it is transmitted by the authenticator only when they receive it. We tested each operation one hundred times, and the average the computation time and the corresponding standard deviation for individual cryptographic operations are presented in Table 4.


**Table 4.** Individual operations of MAuth-CAN (*μ*s).

#### *6.2. Reception Time of an AFR Message*

We evaluate the reception time of an AFR message under the following two attacks: message flooding attack and BoA.

#### 6.2.1. Reception Time of an AFR Message under Message Flooding Attacks

As shown in Table 4, the sum of *T<sup>M</sup> Auth* and *<sup>T</sup><sup>R</sup> Gen* is approximately 56.4 μs and less than the transmission time of an AFR message, i.e., 444 μs = *Packet*\_*Size Bus*\_*Speed* = 2 <sup>×</sup> <sup>111</sup> *bits* 500,000 *bits*/*s* (111 bits is size of a CAN data frame with an 8 byte data field if the bit-stuffing rule of the CAN standard is ignored). Since the time to authenticate a CAN message and to generate a report message is less than the transmission time of the report message, the authenticator can authenticate all CAN messages without increasing its own message queue. Thus, every report message for an invalid CAN message can be transmitted successfully within a bounded time, which is the length of 4 × the transmission time as described in Theorem 1. According to our implementation result, the worst time of report reception under the flooding attacks is approximately 1012 μs, as shown in Figure 15a.

**Figure 15.** The reception time of an AFR message under attacks. (**a**) This figure shows the AFR reception time under message flooding. Note that the maximum time of AFRs is 1012 μs. (**b**) This figure shows the ARF reception time under BoA where the malicious ECU creates the bit-error at the FIRST bit position in the AFR message. (**c**) This figure shows the ARF reception time under BoA where the malicious ECU creates the bit-error at the LAST bit position in the AFR message.

The reception time shown in Figure 15a is slightly larger than 888 μs = *Packet*\_*Size Bus*\_*Speed* = 4 <sup>×</sup> <sup>111</sup> *bits* 500,000 *bits*/*s*, which is the theoretical transmission time of four CAN packets. The reason is that this experimental time is affected by the bit stuffing rule for synchronization of CAN bus and the time measurement error originating from Arduino UNO. In order to maintain synchronization of CAN bus, a bit stuffing rule is defined in the CAN standard. In this rule, a bit of opposite value is inserted after every five consecutive bits of the same value. For example, if six consecutive dominant bits, 000000, are transmitted by the host controller of an ECU, the CAN controller of the ECU adds one recessive bit after every five consecutive dominant bits 0000010. This additional bit is automatically removed by the CAN controllers of receiver ECUs.

#### 6.2.2. Reception Time of an AFR Message under BoA

BoA on the authenticator causes the transmission delay of an AFR message. In general, the continuous BoA can permanently interfere with the communication from an ECU. However, since the authenticator of MAuth-CAN has two CAN controllers, it is possible for the authenticator to put a malicious ECU that performs the BoA into Passive Error mode which allows the transmission of an AFR message from the authenticator.

The time it takes for the malicious node performing BoA to become the error passive state varies depending on the attack bit position for the BoA (i.e., a bit-error position in the data field of an AFR message). If the malicious node performing BoA creates the first bit-error at the first bit position in the data field of an AFR message, the reception time of an AFR message is approximately 2355 ms as shown in Figure 15b. In other hands, to maximize the transmission delay of an AFR message by the BoA, the malicious node performing BoA can create a bit-error at the last bit position (i.e., 64th bit position in the data field) of an AFR message. In this the worst case, the reception time of an AFR message is approximately 4495 ms as shown in Figure 15c.

Through this experiment, we show Theorem 2 by validating that Passive Error mode of the malicious node performing BoA on the authenticator allows the authenticator with dual CAN-controllers to transmit an AFR message within the bounded time and the worst case time is 4495 ms.

#### **7. Conclusions**

CAN is the most common in-vehicle network system. The latest automobiles developed recently are equipped with numerous ECUs. The ECU over CAN bus can be a victim of security attacks leading to critical risks of vehicle safety. In particular, in case that the infotainment system of unwarranted third party vendor and driving control systems share CAN bus, the security risk is dramatically escalated.

MAuth-CAN is a centralized authentication mechanism for CAN. In MAuth-CAN, the response timing is critical for the properties since a timeout works for the indication that a message passes authentication and ECUs accept a new message stored in its temporary queue when the timeout expires. MAuth-CAN utilizes two CAN controllers for faulttolerance mechanism so that it continues its functionality under message flooding and bus-off attacks.

This paper presents the formal proofs of resiliency and sustainability of MAuth-CAN authentication against message flooding and bus-off attacks where timing is critical to maintain such properties. Also, this paper shows how model checking, a formal verification technique, works for safety and security certificates of in-vehicle network. In this paper, we present a novel CAN model in a formal model, which captures CAN's timing behavior in MAC level of the data-link layer and can thus be used for verification of safety properties of other CAN applications. Using this CAN model, we perform formal verification for the sufficient conditions of those properties of MAuth-CAN.

As conclusions, we show that MAuth-CAN authenticator is sufficiently resilient and sustainable against those two kinds of attacks if MAuth-CAN authenticator can handle two consecutive attack messages, the authentication time is less than the message transmission time, and MAuth-CAN authenticator uses two CAN controllers. Also, we conclude that 4 × Ttx is the minimum and sufficient length of the timeout for ECUs to open incoming messages that have passed MAuth-CAN authentication. The experiment results from the implementation of MAuth-CAN are shown to be consistent with that propositions and conditions we have shown in this paper.

**Author Contributions:** Conceptualization, J.H.K., H.J.J. and I.L.; methodology, J.H.K.; validation, I.L. and H.J.J.; writing—original draft preparation, J.H.K. and H.J.J.; writing—review and editing, I.L.; supervision, I.L.; project administration, I.L.; funding acquisition, I.L., J.H.K., H.J.J. and All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported in part by NRF-2020R1A2C1014855, NRF-2018R1C1B5086261, and ONR N00014-17-1-2012 and ONR N00014-20-1-2744.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Data sharing not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **Data Quality Measures and Efficient Evaluation Algorithms for Large-Scale High-Dimensional Data**

**Hyeongmin Cho and Sangkyun Lee \***

School of Cybersecurity, Korea University, Seoul 02841, Korea; whgudals159@korea.ac.kr **\*** Correspondence: sangkyun@korea.ac.kr

**Abstract:** Machine learning has been proven to be effective in various application areas, such as object and speech recognition on mobile systems. Since a critical key to machine learning success is the availability of large training data, many datasets are being disclosed and published online. From a data consumer or manager point of view, measuring data quality is an important first step in the learning process. We need to determine which datasets to use, update, and maintain. However, not many practical ways to measure data quality are available today, especially when it comes to large-scale high-dimensional data, such as images and videos. This paper proposes two data quality measures that can compute class separability and in-class variability, the two important aspects of data quality, for a given dataset. Classical data quality measures tend to focus only on class separability; however, we suggest that in-class variability is another important data quality factor. We provide efficient algorithms to compute our quality measures based on random projections and bootstrapping with statistical benefits on large-scale high-dimensional data. In experiments, we show that our measures are compatible with classical measures on small-scale data and can be computed much more efficiently on large-scale high-dimensional datasets.

**Keywords:** data quality; large-scale; high-dimensionality; linear discriminant analysis; random projection; bootstrapping

**Citation:** Cho, H.; Lee, S. Data Quality Measures and Efficient Evaluation Algorithms for Large-Scale High-Dimensional Data. *Appl. Sci.* **2021**, *11* , 472. https://doi.org/10.3390/ app11020472

Received: 31 October 2020 Accepted: 4 January 2021 Published: 6 January 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

#### **1. Introduction**

We are witnessing the success of machine learning in various research and application areas, such as vision inspection, energy consumption estimation, and autonomous driving, just to name a few. One major contributor to the success is the fact that the datasets are continuously accumulated and openly published in several domains. Low data quality is very likely to cause inferior prediction performance of machine learning models, and therefore measuring data quality is an indispensable step in a machine learning process. Especially in real-time and mission-critical cyber-physical-system, defining appropriate data quality measures is critical since the low generalization performance of a deployed model can result in system malfunction and possibly catastrophic damages to the physical world. Despite the importance, there exist only a few works for measuring data quality where most of them are hard to evaluate on large-scale high-dimensional data due to computation complexity.

A popular early work on data quality measures includes Ho and Basu [1], proposing 12 types of quality measures which are simple but powerful enough to address different aspects of data quality. These measures have limitations, however, in that it is difficult to compute them for large-scale high-dimensional and multi-class datasets. Baumgartner and Somorjai [2] proposed a quality measure designed for high-dimensional biomedical datasets; however, it does not work efficiently on large-scale data. Recently, Branchaud-Charron et al. [3] proposed a quality measure for high-dimensional data using spectral clustering. Although this measure is adequate for large-scale high-dimensional data, it requires an embedding network which involves a large amount of computation time for training.

In this paper, we propose three new quality measures called *Msep*, *Mvar*, and *Mvari* , and their computation methods that overcome the above limitations. Our approach is inspired by Fisher's linear discriminant analysis (LDA) [4], which is mathematically well-defined for finding a feature subspace that maximizes class separability. Our computation method makes use of the techniques from statistics, random projection [5] and bootstrapping [6], to compute the measure for large-scale high-dimensional data efficiently.

The contributions of our paper are summarized as follows:


#### **2. Related Work**

In general, the quality of data can be measured by Kolmogorov complexity which is also known as the descriptive complexity for algorithmic entropy [7]. However, the Kolmogorov complexity is not computationally feasible; instead, an approximation is used in practice. To our knowledge, there are three main approaches for approximating the Kolmogorov complexity: descriptor-based, classifier-based, and graph-based approaches. We describe these three main categories below.

#### *2.1. Descriptor-Based Approaches*

Ho and Basu [1] proposed simple but powerful quality measures based on descriptors. They proposed 12 quality measures, namely *F*1, *F*2, *F*3, *L*1, *L*2, *L*3, *N*1, *N*2, *N*3, *N*4, *T*1, and *T*2. The *F* measures represent the amount of feature overlap. In particular, *F*1 measures the maximum Fisher's ratio which represents the maximum discriminant power of between features. *F*2 represents the volume of overlap region in the two-class conditional distributions. *F*3 captures the ratio of overlapping features using the maximum and minimum value. The *L* measures are for the linear separability of classes. *L*1 is a minimum error of linear programming (LP), *L*2 is an error rate of a linear classifier by LP, and *L*3 is an error rate of linear classifier after feature interpolation. The *N* measures represent mixture identifiability, the distinguishability of the data points belonging to two different classes. *N*1 represents the ratio of nodes connected to the different classes using the minimum spanning tree of all data points. *N*2 is the ratio of the average intra-class distance and average inter-class distance. *N*3 is the leave-one-out error rate of the nearest neighbor (1NN). *N*4 is an error rate of 1NN after feature interpolation. The *T* measure represents the topological characteristic of a dataset. *T*1 represents the number of hyperspheres adjacent to other class features. *T*2 is the average number of data points per dimension. These quality measures can capture various aspects of data quality; however, they are fixed for binary classification and not applicable for multi-class problems. Furthermore, quality measures, such as *N*1, *N*2, and *N*3, require a large amount of computation time on large-scale high-dimensional data.

Baumgartner and Somorjai [2] proposed a quality measure for high-dimensional but small biomedical datasets. They used singular value decomposition (SVD) with time complexity <sup>O</sup>(*min*(*m*2*n*, *mn*2)), where *<sup>m</sup>* is the number of data points, and *<sup>n</sup>* is the number of features. Thus, it is computationally demanding to calculate their measures for the datasets with large *m* and *n*, such as recent image datasets.

There are other descriptor-based approaches, for example for meta learning [8,9], for classifier recommendation [10], and for synthetic data generation [11]. However, only a small number of data points in a low dimensional space have been considered in these works.

#### *2.2. Graph-Based Approaches*

Branchaud-Charron et al. [3] proposed a graph-based quality measure using spectral clustering. First, they compute a probabilistic divergence-based *K* × *K* class similarity matrix *S*, where *K* is the number of classes. Then, an adjacency matrix *W* is computed from the *S* matrix. The quality measure is defined as a cumulative sum of the eigenvalues gap which is called as cumulative spectral gradient (*CSG*), which represents the minimum cutting cost of the *S*. The authors also used a convolutional neural network-based autoencoder and t-SNE [12,13] to find an embedding that can represent data points (images in their case) well. Although the method is designed for high-dimensional data, it requires to train a good embedding network to reach quality performance.

Duin and P ˛ekalska [14] proposed a quality measure based on a dissimilarity matrix of data points. Since calculating the dissimilarity matrix is a time-consuming process, the method is not adequate for large-scale high-dimensional data.

#### *2.3. Classifier-Based Approaches*

Li et al. [15] proposed a classifier-based quality measure called an intrinsic dimension, which is the minimum number of solutions for certain problems. For example, in a neural network, the intrinsic dimension is the minimum number of parameters to reach the desired prediction performance.

The method has a benefit that it can be applied to many different types of data as long as one has trainable classifiers for the data; however, it often incurs high computation cost since it needs to change the number of classifier parameters iteratively during data quality evaluation.

Overall, the existing data quality measures are mostly designed for binary classification in low dimension spaces with a small number of data points. Due to their computation complexity, they tend to consume large amount of time when applied to large-scale highdimensional data. In addition, the existing measures tend to focus only on the inter-class aspects of data quality. In this paper, we propose two new data quality measures suitable for large-scale high-dimensional data resolving the above mentioned issues.

#### **3. Methods**

In this section, we formally describe our data quality measures. We focus on multiclass classification tasks where each data point is associated with a class label out of *c* categories (*c* ≥ 2). Our measures are created by adapting ideas from Fisher's LDA [4]. Fisher's LDA is a dimensionality reduction technique, finding a projection matrix that maximizes the between-class variance and minimizes the within-class variance at the same time. Motivated by the idea, we propose two types of data quality measures, class separability *Msep* and in-class variability *Mvar* and *Mvari* . For efficient handling of largescale high-dimensional data, we also propose techniques to reduce both computation and memory requirements taking advantage of statistical methods, bootstrapping [6] and random projection [5].

#### *3.1. Fisher's LDA*

The objective of Fisher's LDA [4] is to find the feature subspace which maximizes the linear separability of a given dataset. Fisher's LDA achieves the objective by minimizing the within-class variance and maximizing the between-class variance simultaneously.

To describe the Fisher's LDA formally, let us consider an input matrix *<sup>X</sup>* <sup>∈</sup> <sup>R</sup>*m*×*<sup>n</sup>* where *<sup>m</sup>* is the number of data points, *<sup>n</sup>* is the input dimension, and *xi*,*<sup>j</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>* is a *<sup>j</sup>*-th data point in the *<sup>i</sup>*-th class. The within-class scatter matrix *Sw* <sup>∈</sup> <sup>R</sup>*n*×*<sup>n</sup>* is defined as follows:

$$\mathcal{S}\_{\overline{w}} = \sum\_{i=1}^{c} \sum\_{j=1}^{m\_i} (\mathbf{x}\_{i,j} - \overline{\mathbf{x}}\_i)(\mathbf{x}\_{i,j} - \overline{\mathbf{x}}\_i)^T.$$

Here, *c* is the number of classes, *mi* is the number of data points in the *i*-th class, and *xi* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>* is the mean of data points in the *<sup>i</sup>*-th class. This formulation can be interpreted as the sum of class-wise scatter matrices. A small determinant of *Sw* indicates that data points in the same class exist densely in a narrow area, which may lead to high class separability.

Next, the between-class scatter matrix *Sb* <sup>∈</sup> <sup>R</sup>*n*×*<sup>n</sup>* is defined as follows:

$$S\_b = \sum\_{i=1}^{c} m\_i (\overline{\mathfrak{X}}\_i - \overline{\mathfrak{X}}) (\overline{\mathfrak{X}}\_i - \overline{\mathfrak{X}})^T,$$

where *x* is the mean of entire data points in the given dataset. A large determinant of *Sb* indicates that the mean vector *xi* of each class is far from the *x*, another condition hinting for high class separability.

Using these two matrices, we can describe the objective of Fisher's LDA as follows:

$$\Phi\_{lda} \in \arg\max\_{\Phi} \frac{|\Phi^T S\_b \Phi|}{|\Phi^T S\_w \Phi|}. \tag{1}$$

Here, <sup>Φ</sup>*lda* <sup>∈</sup> <sup>R</sup>*n*×*<sup>d</sup>* is the projection matrix where *<sup>d</sup>* is the dimension of feature subspace (in general, we choose *d n*). The column vectors of projection matrix Φ*lda* are the axes of feature subspace, which maximize the class separability. The term in the objective function is also known as the Fisher's criterion. By projecting *X* onto these axes, we obtain a *<sup>d</sup>*-dimensional projection of the original data *<sup>X</sup>* <sup>∈</sup> <sup>R</sup>*m*×*d*:

$$X' = X\Phi\_{lda}\cdot$$

In general, if *Sw* is an invertible matrix, we can calculate the projection matrix which maximizes the objective of the Fisher's LDA by eigenvalue decomposition.

#### *3.2. Proposed Data Quality Measures*

Motivated by the ideas in Fisher's LDA, we propose two types of new data quality measures: *Msep* (class separability), *Mvar* and *Mvari* (in-class variability).

#### 3.2.1. Class Separability

Our first data quality measure tries to capture the class separability of a dataset by combining the within-class variance and between-class variance, similarly to Fisher's LDA (1) but more efficiently for large-scale and high-dimensional data and comparable with other datasets.

We start from creating the normalized versions of the matrices *Sw* and *Sb* in Fisher's LDA (1) so that they will not be affected by the different numbers of examples in classes (*mi* is the number of examples in the *i*-th class) across different datasets. The normalized versions are denoted by *S*ˆ *<sup>w</sup>* and *S*ˆ *b*:

$$\hat{S}\_{w} := \sum\_{i=1}^{c} \frac{1}{m\_{i}} \sum\_{j=1}^{m\_{i}} (\mathbf{x}\_{i,j} - \overline{\mathbf{x}}\_{i})(\mathbf{x}\_{i,j} - \overline{\mathbf{x}}\_{i})^{T}, \qquad \hat{S}\_{b} := \sum\_{i=1}^{c} \frac{m\_{i}}{\sum\_{j=1}^{c} m\_{j}} (\overline{\mathbf{x}}\_{i} - \overline{\mathbf{x}})(\overline{\mathbf{x}}\_{i} - \overline{\mathbf{x}})^{T}. \tag{2}$$

Considering the determinants of these *n* × *n* matrices as in the original Fisher's LDA will be too costly for a high-dimensional data where *n* is large, since the time complexity to compute the determinants will be proportional nearly to *n*3. Instead, we consider the direction of maximum linear separation *<sup>v</sup>* <sup>∈</sup> <sup>R</sup>*<sup>n</sup>* that maximizes the ratio of between-class variance to the within-class variance being projected onto the vector. Using the vector, we define our first data quality measure *Msep* for class separability as follows:

$$\mathcal{M}\_{\text{scp}} := \max\_{\upsilon \in \mathbb{R}^n : \|\upsilon\| = 1} \frac{|\upsilon^T \hat{S}\_b \upsilon|}{|\upsilon^T \hat{S}\_{\text{uv}} \upsilon|}. \tag{3}$$

This formulation is almost the same as (1) in Fisher's LDA except that (1) finds the projection matrix *φlda* which maximizes the Fisher's criterion, while, in (3), we will focus on finding the maximum value of Fisher's criterion itself. Unlike Fisher's criterion, our measure *Msep* is comparable across datasets with different numbers of classes and examples due to normalization, to check the relative difficulty of linear classification.

Solving (3) directly will be preventive for a large *n* as in the original LDA. If *S*ˆ *<sup>w</sup>* is invertible, we can calculate the vector which maximizes *Msep* as follows using simple linear algebra. To find the vector *v* which maximizes the equation in (3), first differentiate it with respect to *v* to get:

$$(\upsilon^T \mathcal{S}\_b \upsilon) \mathcal{S}\_w \upsilon = (\upsilon^T \mathcal{S}\_w \upsilon) \mathcal{S}\_b \upsilon.$$

This leads us to the following generalized eigenvalue problem in the form of:

$$\mathcal{S}\_w^{-1} \mathcal{S}\_b \upsilon = \lambda \upsilon,\tag{4}$$

where *<sup>λ</sup>* <sup>=</sup> *<sup>v</sup>TS*<sup>ˆ</sup> *bv vTS*ˆ *wv* can be thought as an eigenvalue of the matrix *<sup>S</sup>*ˆ−<sup>1</sup> *<sup>w</sup> <sup>S</sup>*<sup>ˆ</sup> *<sup>b</sup>*. The maximizer *v* is the eigenvector corresponding to the largest eigenvalue of *S*ˆ−<sup>1</sup> *<sup>w</sup> S*ˆ *<sup>b</sup>* which can be found rather efficiently by the Lanczos algorithm [16]. However, the overall time complexity for computation can be up to <sup>O</sup>(*n*3), which makes it difficult to calculate the optimal vector for high-dimensional data, such as images. In Section 3.3, we provide an efficient algorithm to compute *Msep* using random projection.

#### 3.2.2. In-Class Variability

Our second data quality measure gauges the in-class variability. Figure 1 shows one of the motivating examples to consider in-class variability for data quality. In the figure, we have two photos of the Bongeunsa temple in Seoul, Korea, taken by the same photographer. The photographer had been asked to take photos of Korean objects from several different angles, and it turned out that quite a few of the photos were taken in only marginal angle differences. Since the data creation was a government-funded project providing data for developing object recognition systems in academia and industry, low data variability was definitely an issue.

**Figure 1.** An example of low in-class variability that similar images in the same class. The images are Bongeunsa temple in Seoul, Korea. (Source: Korean Type Object Image AI Training Dataset at http://www.aihub.or.kr/aidata/132, National Information Society Agency.)

Here, we define two types of in-class variability measure, the overall in-class variability of a given dataset *Mvar* and the in-class variability of the *i*-th class, *Mvari* . First, the overall inclass variability *Mvar* tries to capture the minimum variance of data points being projected onto any direction, based on the matrix *S*ˆ *w* defined in (2):

$$M\_{var} := \min\_{\upsilon \in \mathbb{R}^n : \|\upsilon\| = 1} \frac{1}{c \cdot n} \upsilon^T \mathbb{S}\_w \upsilon\_\prime$$

where *c* is the number of class and *n* is the dimension of data. Unlike class separability, we added additional normalization factors *c* and *n*, since the value *S*ˆ *<sup>w</sup>* is affected by the number of class and the data dimension.

Second, the class-wise in-class variability *Mvari* is based on the sample covariance matrix of each class:

$$\hat{S}\_{\overline{m}\_i} := \frac{1}{m\_i} \sum\_{j=1}^{m\_i} (\boldsymbol{\pi}\_{i,j} - \overline{\boldsymbol{\pi}}\_i)(\boldsymbol{\pi}\_{i,j} - \overline{\boldsymbol{\pi}}\_i)^T \boldsymbol{\pi}\_i$$

where *mi* is the number of data points in the *i*-th class. The class-wise in-class variability measure *Mvari* is defined as follows:

$$M\_{\mathsf{W}\mathcal{T}\_i} := \min\_{\upsilon \in \mathbb{R}^n : \|\upsilon\| = 1} \frac{1}{\mathcal{C} \cdot n} \upsilon^T \mathcal{S}\_{\upsilon v\_i} \upsilon \ . $$

The normalization factors *c* and *n* are required for the same reason as in *Mvar*. The measure *Mvari* represents the smallest variance of the data points in the same class after being projected onto any direction.

As a matter of fact, *Mvar* and *Mvari* are the same as the smallest eigenvalue of <sup>1</sup> *c*·*n S*ˆ *w* and 1 *c*·*n S*ˆ *wi* which can be computed for instance using the Lanczos algorithm [16] on the inverse of them with <sup>O</sup>(*n*3) computation, which will be preventive for large data dimensions *<sup>n</sup>*. We discuss a more efficient way to estimate the value in the next section, which can be computed alongside with our first data quality measure without significant extra cost.

Using the *Mvar* and *Mvari* , we can analyze the variety or redundancy in a given dataset. For instance, a very small *Mvar* and *Mvari* would indicate that we may have a small diversity issue where data points in the invested class are mostly alike. On the other hand, the overly large *Mvar* and *Mvari* may indicate a noise issue in the class, possibly including incorrect labeling. The difference between *Mvar* and *Mvari* is that the *Mvar* aggregates the information of diversity for each class, and the *Mvari* represents the information of diversity for a specific class. Since the *Mvar* aggregates the information of variability of data points in each class, we can use this for comparing the in-class variability between datasets. On the other hand, we can use *Mvari* for the datasets analysis, i.e., data points of a specific class with less *Mvari* than other classes may cause low generalization performance. We will discuss more details in Section 4.

#### *3.3. Methods for Efficient Computation*

One of the key properties required for data quality measures is that they should be computable in a reasonable amount of time and computation resources since the amount and the dimension of data are keep increasing as new advanced sensing technologies become available. In this section, we describe how we avoid a large amount of time and memory complexity to compute our suggested data quality measures.

#### 3.3.1. Random Projection

Random projection [5] is a dimension reduction technique that can transform an *n*-dimensional vector into a *k*-dimensional vector (*k n*), while preserving the critical information of the original vector. The idea behind of random projection is the Johnson-Lindenstrauss lemma [17]. That is, for any vectors {*x*, *x* } ∈ *X* from a set of *m* vectors in

*<sup>X</sup>* <sup>⊂</sup> <sup>R</sup>*<sup>n</sup>* and for <sup>∈</sup> (0, 1), there exists a linear mapping *<sup>f</sup>* : <sup>R</sup>*<sup>n</sup>* <sup>→</sup> <sup>R</sup>*<sup>k</sup>* such that the pairwise distances of vectors are almost preserved after projection in the sense that:

$$(1 - \epsilon) \|\mathbf{x} - \mathbf{x}'\|\_2^2 \le \|f(\mathbf{x}) - f(\mathbf{x}')\|\_2^2 \le (1 + \epsilon) \|\mathbf{x} - \mathbf{x}'\|\_{2'}^2$$

where *k* > 8 ln(*m*)/2. It is known that when the original dimension *n* is large, a random projection matrix *<sup>P</sup>* <sup>∈</sup> <sup>R</sup>*k*×*<sup>n</sup>* can serve as the feature mapping *<sup>f</sup>* in the lemma, since random vectors in R*<sup>n</sup>* tend to be orthogonal to each other as *n* increases [18].

Motivated by the above phenomenon, we use random projection to find a vector that satisfies (3) instead of calculating the eigenvalue decomposition to solve (4). The idea is that if the number of random vectors is sufficiently large, the maximum value of the Fisher's criterion calculated by random projection can approximate the behavior of a true solution.

Furthermore, random projection makes it unnecessary to explicitly store *S*ˆ *<sup>w</sup>* and *S*ˆ *b* since we can simply compute the denominator and numerator of (3) as follows:

$$w^T \hat{S}\_w w = \sum\_{i=1}^c \frac{1}{m\_i} \sum\_{j=1}^{m\_i} w^T (\mathbf{x}\_{i,j} - \overline{\mathbf{x}}\_i)(\mathbf{x}\_{i,j} - \overline{\mathbf{x}}\_i)^T w\_{,i}$$

$$w^T \hat{S}\_b w = \sum\_{i=1}^c \frac{m\_i}{\sum\_{j=1}^c m\_j} w^T (\overline{\mathbf{x}}\_i - \overline{\mathbf{x}})(\overline{\mathbf{x}}\_i - \overline{\mathbf{x}})^T w\_{,i}$$

where *w* is a random unit vector drawn from N (0, 1). This technique is critical for dealing with high-dimensional data, such as images, in a memory-efficient way. In our experiments, ten random projection vectors were sufficient in most cases to accurately estimate our quality measures.

#### 3.3.2. Bootstrapping

Bootstrapping [6] is a sampling-based technique that estimates the statistic of the population with little data using sampling with replacement. For instance, bootstrapping can be used to estimate the mean and the variance of a statistic from an unknown population. Let *si* is a statistic of interest that is calculated from a randomly drawn sample of an unknown population. The mean and variance of the statistic can be estimated as follows:

$$\hat{\mu} = \frac{1}{B} \sum\_{i=1}^{B} s\_{i\prime} \qquad \hat{\sigma}^2 = \frac{1}{B} \sum\_{i=1}^{B} s\_i^2 - \left(\frac{1}{B} \sum\_{i=1}^{B} s\_i\right)^2 \hat{\mu}$$

where *B* is the number of bootstrap samples, *μ*ˆ is a mean estimate of the statistic, and *σ*ˆ <sup>2</sup> is the variability of the estimate. By using a small *B*, we can reduce the number of data points to be considered at once. We found that *B* = 100 and making each bootstrap sample to be 25% of a given dataset in size worked well overall our experiments. We summarized the above procedure in Algorithm 1 (The implementation is available here: https://github.com/Hyeongmin-Cho/Efficient-Data-Quality-Measures-for-High-Dimensional-Classification-Data.)

**Algorithm 1:** Algorithm of class separability and in-class variability.

**Result:** *Msep*, *Mvar* and *Mvari* score **Dataset** = {(*x*1, *y*1),...,(*xm*, *ym*)} **Args**= the number of samples *B*, a sample ratio of each bootstrap sample against a given dataset *R*, the number of random vector used in each sample *nv*, an array storing the values of overall in-class variability *Avar*, an array storing the values of class-wise in-class variability *Avari* and an array storing the values of class separability *Asep*. '←−' symbol stands for variable assignment *i* ←− 1 *Avar* ←− {} *Avari* ←− {} *Asep* ←− {} **while** *i* ≤ *B* **do** *j* ←− 1 *i* ←− *i* + 1 Calculate the number of class *c* and data dimension *n* from the dataset Sampling with replacement using stratified sampling as much as R ratio from the dataset Standardize the sampled dataset **while** *j* ≤ *nv* **do** *j* ←− *j* + 1 *w* ←− a unit vector drawn from N (0, 1) Compute *wTS*ˆ *wj*,*cw wTS*ˆ *wj <sup>w</sup>* ←− *Sum*({*wTS*<sup>ˆ</sup> *wj*,1*w*,..., *<sup>w</sup>TS*<sup>ˆ</sup> *wj*,*cw*}) *Mvar*−*<sup>j</sup>* ←− *<sup>w</sup>TS*<sup>ˆ</sup> *wj w* Compute *wTS*ˆ *bj w Msep*−*<sup>j</sup>* ←− (*wTS*<sup>ˆ</sup> *bj w*)/(*wTS*ˆ *wj w*) **end** *Avari* .insert({ <sup>1</sup> *<sup>c</sup>*·*<sup>n</sup>* min(*wTS*<sup>ˆ</sup> *<sup>w</sup>*1,1*w*,..., *<sup>w</sup>TS*<sup>ˆ</sup> *wnv*,1*w*),..., <sup>1</sup> *<sup>c</sup>*·*<sup>n</sup>* min(*wTS*<sup>ˆ</sup> *<sup>w</sup>*1,*cw*,..., *<sup>w</sup>TS*<sup>ˆ</sup> *wnv*,*cw*)}) *Avar*.insert( <sup>1</sup> *<sup>c</sup>*·*<sup>n</sup>* min(*Mvar*−1,..., *Mvar*−*nv*)) *Asep*.insert(max(*Msep*−1,..., *Msep*−*nv*)) **end** *Msep* ←− *Mean*(*Asep*) *Mvar* ←− *Mean*(*Avar*) *Mvari* ←− *ClassWiseMean*(*Avari* ) **Return** *Msep*, *Mvar*, *Mvari*

#### **4. Experiment Results**

In this section, we show that our method can evaluate the data quality of the large-scale high-dimensional dataset efficiently.

To verify the representative performance of *Msep* for class separability, we calculated the correlation between the accuracy of chosen classifiers and *Msep*. Classifiers used in our experiments are as follows: a perceptron, a multi-layer perceptron with one hidden layer and LeakyReLU (denoted by MLP-1), and a multi-layer perceptron with two hidden layers and LeakyReLU (denoted by MLP-2). To simplify the experiments, we trained the models with the following settings: 30 epochs, a batch size of 100, a learning rate of 0.002, the Adam optimizer, and the cross-entropy loss function. Additionally, we fixed the hyperparameters of Algorithm 1 as *B* = 100, *R* = 0.25, and *nv* = 10 since there was no big difference in performance when larger hyperparameter values were used.

For comparison with other quality measures, we chose *F*1, *N*1, and *N*3 from Ho and Basu [1] and *CSG* from Branchaud-Charron et al. [3]. Here, *N*1, *N*3, and *CSG* are known to be highly correlated with test accuracy of classifiers Branchaud-Charron et al. [3]. *F*1 is similar to our *Msep* in its basic idea. Other quality measures suggested in Ho and Basu [1] showed very similar characteristics to *F*1, *N*1, *N*3, and *CSG* and are therefore not included in the results.

#### *4.1. Datasets*

To evaluate the representative performance of *Msep* for class separability, we used various image datasets that are high-dimensional and popular in mobile applications. We chose ten benchmark image datasets for our experiments: MNIST, notMNIST, CIFAR10, Linnaeus, STL10, SVHN, ImageNet-1, ImageNet-2, ImageNet-3, and ImageNet-4. MNIST [19] consists of ten handwritten digits from 0 to 9. The dataset contains 60,000 training and 10,000 test data points. We sampled 10,000 data from the training data for a model training and measuring the quality, and we sampled 2500 data from the test data for assessing the model accuracy. The notMNIST [20] dataset is quite similar to MNIST, containing English letters from A to J in various fonts. It has 13,106 training and 5618 test samples. We sampled the data in the same way as MNIST. Linnaeus [21] consists of five classes: berry, bird, dog, flower, and others. Although the dataset is available in various image sizes, we chose 32 × 32 to reduce the computation time of *N*1, *N*3, and *CSG*. CIFAR10 [22] is for object recognition with ten general object classes. It consists of 50,000 training data and 10,000 test data points. We sampled the CIFAR10 dataset in the same way as MNIST. STL10 [23] is also for object recognition with ten classes, and it has 92 × 92 images: we resized the images into 32 × 32 to reduce the computation time for the *N*1, *N*3, and *CSG*. The dataset consists of 5000 training and 8000 test data points. We combined these two sets into a single dataset, and then sampled 10,000 data points from the combined set for model training and measuring quality. We also sampled 2500 data points from the combined set for assessing prediction model accuracy if necessary. SVHN [24] consists of street view house number images. The dataset contains 73,200 data points. We sampled 10,000 training data for a model training and measuring the quality, and we sampled 2500 data for assessing the model accuracy. ImageNet-1, ImageNet-2, ImageNet-3 and ImageNet-4 are subsets of Tiny ImageNet dataset [25]. The Tiny ImageNet dataset contains 200 classes, and each class has 500 images. They are consist of randomly selected ten classes of the Tiny ImageNet dataset (total 5000 data points). We used 4500 data points for model training and measuring the quality and 500 data points for assessing the model accuracy, respectively.

We summarized the details of datasets in Table 1. The accuracy values in the Table 1 are calculated from the MLP-2 model since it showed good overall performance compared to the perceptron and the MLP-1 models.


**Table 1.** Details of the datasets used in our experiments. The accuracy is from MLP-2, and *M* represents the total number of data used for training and evaluation.

*4.2. Representation Performance of the Class Separability Measure MSep*

Here, we show in experiments that how well our first quality measure *Msep* represents class separability, compared to simple but popular classifiers and the existing data quality measures.

#### 4.2.1. Correlation with Classifier Accuracy

To demonstrate how well *Msep* represents the class separability of given datasets, we compared the absolute value of Pearson correlation and Spearman rank correlation between quality measures *Msep*, *N*1, *N*3, *F*1, and *CSG* to the prediction accuracy of three classification models: perceptron, MLP-1, and MLP-2. Table 2 summarizes the results.

In the case of the perceptron, *Msep* has a similar Pearson correlation with the shortest computation time to the *N*1 and *N*3 which have the highest correlation with the accuracy of classifiers. Furthermore, *Msep* and *F*1 have the highest Spearman rank correlation. This is because *Msep* and *F*1 measure linear separability that is essentially the information captured by the linear classifier, the perceptron in our case. In the case of MLP-1 and MLP-2, *Msep* also showed a sufficiently high correlation with classification accuracy although it is slightly lower in Pearson correlation compared to the case of the perceptron. On the other hand, *CSG* does not seem to have noticeable benefits considering its computation time. This is because *CSG* is affected by an embedding network which requires a large amount of training time.

In summary, the result shows that our measure *Msep* can capture separability of data as good as the existing data quality measures, while reducing computation time significantly.

**Table 2.** The absolute Pearson and Spearman rank correlation between the quality measures and the accuracy of three classifiers on the ten image datasets (MNIST, CIFAR10, notMNIST, Linnaeus, STL10, SVHN, ImageNet-1, ImageNet-2, ImageNet-3, and ImageNet-4). The computation time of our method *Msep* is the fastest.


4.2.2. Correlation with Other Quality Measures

In order to check if our suggested data quality measure *Msep* is compatible with the existing ones in quality, and therefore ours can be a faster alternative to the existing data quality measures, we computed the Pearson correlation between *F*1, *N*1, *N*3, *CSG*, and *Msep*. The results are summarized in Table 3. Our measure *Msep* showed a high correlation with all four existing measures *F*1, *N*1, *N*3, and *CSG*, indicating that *Msep* is able to capture the data quality information represented by *F*1, *N*1, *N*3, and *CSG*.


**Table 3.** The absolute Pearson correlation between *Msep* and other quality measures.

#### 4.2.3. Computation Time

As mentioned above, our quality measure *Msep* represents the class separability well, but much faster in computation than *F*1, *N*1, *N*3, and *CSG*. Here, we show how the computation time changes according to data dimension and sample sizes, in order to show that our suggested data quality measure can be used for many big-data situations.

The computation time according to the data dimension is shown in Figure 2 and Table 4. In all dimensions, our measure *Msep* was on average 3.8 times faster than *F*1, 13.1 times faster than *N*1, 25.9 times faster than *N*3, and 17.7 times faster than *CSG*. Since the *N*1, *N*3, and *CSG* have to calculate the MST and to train a 1NN classifier and embedding networks, respectively, it is inevitable that they would take a large amount of computation time (see more details in Sections 2.1 and 2.2). On the other hand, since *Msep* utilizes random projection and bootstrapping to avoid eigenvalue decomposition problem and to deal with the big-data situations, the computation time of *Msep* is shortest in all cases.

**Figure 2.** Data dimension vs. computation time (CIFAR10).

**Table 4.** Data dimension vs. computation time (CIFAR10) in detail (the values in the table represent seconds).


Figure 3 and Table 5 show how computation time changes for various sample sizes. Our measure *Msep* was on average 2.8 times faster than *F*1, 47.0 times faster than *N*1, 94.5 times faster than *N*3, and 41.6 times faster than *CSG*. *N*1 and *N*3 show extremely

increasing computation time with respect to the sample size, which is not suitable for large-scale high-dimensional datasets.

All the above results show that our measure *Msep* is suitable for the big-data situations and compatible with other well-accepted data quality measures.

**Figure 3.** Sample size vs. computation time (CIFAR10).


**Table 5.** Sample size vs. computation time (CIFAR10) in detail (the values in the table represent seconds).

#### 4.2.4. Comparison to Exact Computation

In Section 3.2, we proposed to use random projections and bootstrapping for fast approximation of the solution of (4), which can be computed exactly as an eigenvalue. Here, we compare the values of *Msep* using the proposed approximate computation (denoted by "Approx") and the exact computation (denoted by "Exact") due to an eigensolver in the Python scipy package. One thing is that, since we use only Gaussian random vectors for projection, it is likely that they may not match the true eigenvectors; therefore, the approximated quantity would differ from the exact value. However, we found that the approximate quantities match well the exact values in their correlation, as indicated in Table 6, and, therefore, can be used for fast comparison of data quality of high-dimensional large-scale datasets.

#### *4.3. Class-Wise In-Class Variability Measure, MVari*

In fact, many of the existing data quality measures are designed to measure the difficulty of classification for a given dataset. However, we believe that the in-class variability of data must be considered as another important factor of data quality. One example to show the importance and usefulness of our in-class variability measure *Mvari* is the generalization performance of a classifier.


**Table 6.** Comparison of Exact and Approx values and their correlations. Pearson and Spearman are the Pearson and Spearman rank correlation between the Exact and Approx.

The generalization performance of the learning model is an important consideration especially in mission-critical AI-augmented systems. There are many possible reasons causing low generalization, and overfitting is one of the troublemakers. Although we have techniques to alleviate overfitting, e.g., checking the learning curve, regularization [26,27], and ensemble learning [28], it is critical to check if there is an issue in data to begin with which may lead to any inductive bias. For example, a very small value of *Mvari* in a class compared to the others would indicate a lack of variability in the class, which can lead to low generalization due to, e.g., unchecked input noise, background signal, object occlusion, and angle/brightness/contrast differences during training. On the other hand, the overly large *Mvari* may indicate outliers or even mislabeled data points likely to incur unwanted inductive bias in training.

To show the importance and usefulness of in-class variability, we created a degraded version of CIFAR10 (denoted by degraded-CIFAR10) by reducing the variability of a specific class. The degraded-CIFAR10 is created by the following procedure. First, we chose an image of the deer class in the training data, then selected the nine mostly similar images in the angular distance to the chosen one. Figure 4 shows the total ten images selected by the above procedure that have similar backgrounds and shapes. Next, we created 1000 images by sampling with replacement from the ten images, while adding random Gaussian noise with zero mean and unit variance, and we replaced the original deer class data with sampled degraded deer class data.

Table 7 shows that the value of *Mvari* is significantly small on the degraded deer class compared to the other classes. That is, it can capture small in-class variability. In contrast, Table 8 shows that the existing quality measures *F*1, *N*1, *N*3, and *CSG* may not be enough to signify the degradation of the dataset. As we can see, all quality measures indicate that class separability increased in degraded-CIFAR10 compared to the original CIFAR10; however, the test accuracy from MLP-2 decreased. This is because the reduction in in-class variability is very likely to decrease the generalization performance. Therefore, class separability measures can deliver incorrect information regarding data quality in terms of in-class variability, which can be a critical problem for generating a trustworthy dataset or training a trustworthy model.

**Figure 4.** Ten similar selected images in the deer class on degraded-CIFAR10. Images with high similarity were selected using cosine similarity.

**Table 7.** Our in-class variability measure for the degraded-CIFAR10 dataset. A class with a smaller value than other classes has a lower variability.


**Table 8.** Quality measures on the degraded-CIFAR10 dataset. The existing quality measures *F*1, *N*1, *N*3, and *CSG* only capture the class separability and fail to capture the degradation. Lower values of *N*1, *N*3, and *CSG* represent higher class separability, whereas lower values of *F*1 represent lower class separability. The test accuracy is from MLP-2 trained with original and degraded-CIFAR10, respectively, and tested on the original CIFAR10 test data.


As we showed above, the small value of *Mvari* of a specific class represents that similar images do exist in the invested class, which can lead to low generalization performance of classifiers. Suppose we have generated a dataset for an autonomous driving object classification task. The dataset has been revealed that it has a high class separability through various quality measures. Moreover, the training accuracy was also high. Therefore, one may expect high generalization performance. Unfortunately, the exact opposite can happen. If the variability in the specific class is small as in the degraded-CIFAR10 example above, high generalization performance cannot be expected. For instance, if a car with new colors and new shapes that have never been trained is given as an input to the model, the

probability of properly classifying the car will be low. This example indicates that in-class variability plays an important role in data quality evaluation.

#### *4.4. Quality Ranking Using MSep and MVar*

As we mentioned before, quality measures *Msep* and *Mvar* can be compared among different datasets. The class separability *Msep* represents the relative difficulty of linear classification, and the overall in-class variability *Mvar* represents the average variability of data points in classes.

Figure 5 shows a data quality comparison plot of datasets in our experiments. The direction towards the lower-left corner indicates lower class separability and lower in-class variability, and the upper-right direction is for higher class separability and higher in-class variability. According to the plot, the MNIST and the notMNIST dataset show very high linear separability compared to other datasets, indicating that their classification might be easier than the other datasets. The SVHN dataset is at the lower-left corner, indicating low linear separability and possible redundancy issues (this could be just the reflection of the fact that many SVHN images contain changing digits but the same backgrounds). The four ImageNet datasets, Linnaeus, CIFAR10 and STL10 have similar class separability and in-class variability values. This appears to be understandable considering their similar data construction designed for object recognition.

**Figure 5.** Data quality plot using the two proposed quality measures.

#### **5. Conclusions**

In this paper, we proposed data quality measures *Msep*, *Mvar* and *Mvari* , which can be applied efficiently on large-scale high-dimensional datasets. Our measures are estimated using random projection and bootstrapping and therefore can be applied efficiently on large-scale high-dimensional data. We showed that *Msep* can be used as a good alternative to the existing data quality measures capturing class separability, while reducing their computational overhead significantly. In addition, *Mvar* and *Mvari* measures in-class variability, which is another important factor to avoid unwanted inductive bias in trained models.

**Author Contributions:** Conceptualization, H.C. and S.L.; methodology, H.C. and S.L.; validation, H.C.; writing-original draft preparation, H.C.; writing-review and editing, S.L.; supervision, S.L. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was supported by Basic Science Research Program through the National Research Foundation of Korea(NRF) funded by the Ministry of Education(2018R1D1A1B07051383), and also by the MSIT(Ministry of Science and ICT), Korea, under the ITRC(Information Technology Research Center) support program(IITP-2020-0-01749) supervised by the IITP(Institute of Information & Communications Technology Planning & Evaluation).

**Data Availability Statement:** The data presented in this study are openly available: MNIST (http: //yann.lecun.com/exdb/mnist/), notMNIST (https://www.kaggle.com/lubaroli/notmnist), Linnaeus (http://chaladze.com/l5/), CIFAR10 (https://www.cs.toronto.edu/~kriz/cifar.html), STL10 (https://ai.stanford.edu/~acoates/stl10/), SVHN (http://ufldl.stanford.edu/housenumbers/), and ImageNet (https://www.kaggle.com/c/tiny-imagenet).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


## *Article* **SPEKS: Forward Private SGX-Based Public Key Encryption with Keyword Search**

**Hyundo Yoon 1, Soojung Moon 2, Youngki Kim 2, Changhee Hahn 1, Wonjun Lee <sup>2</sup> and Junbeom Hur 1,\***


Received: 15 September 2020; Accepted: 2 November 2020; Published: 5 November 2020 -

**Abstract:** Public key encryption with keyword search (PEKS) enables users to search over encrypted data outsourced to an untrusted server. Unfortunately, updates to the outsourced data may incur information leakage by exploiting the previously submitted queries. Prior works addressed this issue by means of forward privacy, but most of them suffer from significant performance degradation. In this paper, we present a novel forward private PEKS scheme leveraging Software Guard Extension (SGX), a trusted execution environment provided by Intel. The proposed scheme presents substantial performance improvements over prior work. Specifically, we reduce the query processing cost from *O*(*n*) to *O*(1), where *n* is the number of encrypted data. According to our performance analysis, the overall computation time is reduced by 80% on average. Lastly, we provide a formal security definition of SGX-based forward private PEKS, as well as a rigorous security proof of the proposed scheme.

**Keywords:** searchable encryption; PEKS; forward privacy; trusted execution environment; SGX

#### **1. Introduction**

Data outsourcing to cloud service providers is beneficial in terms of data management, but raises data security and privacy concerns. Encrypting data prior to outsourcing may solve the data privacy problems. However, it inevitably complicates, or sometimes hinders important data management operations such as searches over the outsourced data. Public key encryption with keyword search (PEKS) solves this dilemma, in which data senders are allowed to encrypt data using a public key such that the ciphertexts are searchable only by a data receiver whose secret key is associated with the public key [1].

Unfortunately, previous PEKS schemes are vulnerable to query leakage attacks. For example, in file injection attacks [2], an adversarial data sender generates maliciously crafted files of his choice, encrypts them with the public key of a data receiver, and then outsources it to the cloud storage. Then, the adversary observes file access patterns by monitoring which files are returned in response to queries submitted by a specific receiver, thereby leaking the receiver's queries.

As a countermeasure, forward private PEKS schemes have been proposed, which can guarantee that the past search queries cannot be used for newly inserted files [3]. Unfortunately, the previous schemes are unsuitable for the multi-user environment where multiple data senders are existing for each receiver, which is the widespread setting in cloud-based applications (In this study, a multi-receiver environment is not considered. Thus, henceforth, a multi-user environment implies only a multi-sender environment in the paper). Thus, designing forward private PEKS that securely support multi-user setting in a scalable way is one of the challenging and important goals in the PEKS literature.

Moreover, prior works suffer from high communication overhead that degrades practicality. For example, Zhang et al. [4] achieved forward privacy by means of key revocation, which incurs costly key management tasks to distribute key update messages every time a query is processed. Zeng et al. [5] proposed a scheme to guarantee forward privacy without such key revocations. However, the scheme depends on computationally extensive cryptographic primitives, which incurs unacceptable computation costs in practice. Furthermore, the query size of their scheme depends on the time periods, leading to significant communication overheads.

To design secure and efficient schemes, one may utilize Trusted Execution Environments (TEEs) such as Intel Software Guard Extension (SGX) [6–10]. TEE provides memory isolation such that it loads data or code from (untrusted) main memory to the (trusted) isolated memory area, or enclave. Since TEE can protect data and processes from operating systems or hypervisors using enclaves, it can guarantee confidentiality and integrity of them even when operating systems or hypervisors are compromised. Recently, Amjad et al. [11] introduced an SGX-supported dynamic searchable symmetric encryption scheme that is forward private. However, it is also not applicable to the multi-user settings.

In this paper, we propose SPEKS, a forward private SGX-based public key encryption with keyword search scheme. To the best of our knowledge, it is the first SGX-based PEKS that achieves forward privacy in a multi-user setting. The proposed scheme uses a search counter to achieve forward privacy by unlinking the current data status with the previous queries. Specifically, both the data receiver and the cloud server share the same search counter, which is updated per each data update. Since the current data is encrypted using the latest search counter, the previous queries cannot be associated with subsequently updated data. Thus, forward privacy is guaranteed in the proposed scheme. In addition, SPEKS significantly outperforms prior works [4,5,12] and preserves forward privacy against stronger attack model by utilizing Intel SGX.

The contributions of this work are as follows:


#### **2. Background**

Intel Software Guard Extension (SGX) is used for designing our construction for forward secure searchable encryption. SGX is an extension of the x86 instruction set architecture (ISA) introduced since the 6th generation Intel Skylake Processor. SGX provides Memory Isolation, Enclave Page Cache, and Software Attestation, which are major functionalities that we rely on to construct our scheme. In this section, we briefly introduce the SGX structures and basic PEKS algorithms upon which our scheme is built.

#### *2.1. Intel Software Guard Extensions (SGX)*

**Memory Isolation.** SGX platform can be divided into untrusted parts and trusted parts. Enclaves are trusted parts, or private regions of the physical memory whose contents are protected. The memory space for enclave is isolated from any process outside the enclave itself, including processes running at higher privilege levels. Thus, any access by other processes such as privileged operating

systems, firmware, hypervisor and code in system management mode (SMM) to the enclave memory is disallowed.

The enclave memory is mapped into virtual memory of the untrusted part, and the untrusted part is executed on the ordinary process within the virtual address space. The mapping of the enclave memory is crucial, because this enables the enclave to access the host process's entire virtual memory. However, the host process is only allowed to call enclave through certain interface. In addition, the executed code and data inside the enclave are encrypted when they reside in the untrusted part of the memory. When loaded into the enclave, on the other hand, the enclave is decrypted on the fly within the CPU. Thus, the processor protects the code from being examined by other processes, which treated as potentially hostile.

Enclave Page Cache (EPC): Enclave page cache (EPC) is memory area where the enclave code and data are placed. Using the Memory Encryption Engine (MEE), the EPC is encrypted and external reads on that memory bus can only monitor encrypted data. For EPC, a fixed amount of the main memory, limited to 128 MB, of the system is allocated to store enclave and related metadata. Since the dedicated memory is shared between enclave itself and related metadata, the enclave cache, on average, is able to use 96 MB [13]. Because of the memory limitation, enclaves sometimes need to swap pages when dealing with large data of which size exceeds the dedicated memory. During the boot phase, the SGX memory is reserved statically throughout the runtime of the system. If there are multiple enclaves, the memory is supposed to be dynamically managed by the OS and allocated to each enclave. When the page swapping occurs, the key generated at the boot-time is used for both encryption and decryption of the page. In the page swapping operations, confidentiality and integrity of the swapped-out pages can be guaranteed.

Software Attestation [14]: SGX supports software attestation feature that verifies the validity of locally or remotely created enclaves. The enclave measurement, which is the initial code loaded when the enclave is created, is used to verify the correctness of the enclave. Provided by the SGX attestation functionality, it can be assured that the measurements are authentic and associated with the benign enclave. For local attestation, EREPORT and EGETKEY instructions are used to generate the signed report and verify it at the target enclave. For remote attestation, the signature is provided by the Quoting Enclave (QE), a component of SGX. Before generating the signature, QE only accepts measurements from the hardware itself, which ensures that only legitimate enclave is measured.

#### *2.2. Public Key Encryption with Keyword Search (PEKS)*

Boneh et al. [1] first introduced the notion of public key encryption with keyword search (PEKS). Compared to symmetric searchable encryption, PEKS has better performance in data sharing. Abdalla et al. [15] gives a generic framework of PEKS and shows how to obtain public key based searchable encryption from anonymous identity-based encryption. In the PEKS scheme, a data receiver provides a gateway with a trapdoor function for keywords. Then, a data sender uses the data receiver's public key for encrypting a keyword and sends the ciphertext to the server or the gateway. The latter applies search or test function to the search token and the ciphertext. When the keywords within the search token and the ciphertext match, the search or the test function returns 1; otherwise, 0. The scheme is proven to be secure in the standard model, but under the condition where the number of malicious clients is smaller than a specified value.

#### **3. SPEKS Overview and Definitions**

In this section, we describe a high level design of our SPEKS scheme, and show the search processes over encrypted data by users. Then, we define algorithms for SPEKS scheme and security model.

#### *3.1. Overview*

The high level design of our scheme is shown in Figure 1. In the proposed scheme, there are three system entities: data receiver (DR), data sender (DS), SGX-enabled cloud server (CS).

During the setup phase, a DR initially generates his private key *SKu*, public key *PKu*, and symmetric key *Ku*. Using the SGX attestation protocol for enclave authentication, the DR establishes a secure channel with the enclave within the CS (shown in step ① in Figure 1). After establishing the secure connection, the DR provisions *SKu* and *Ku* into the enclave (step ①). The enclave stores two provisioned keys for future processes. When enclave is unloaded or rebooted, the provisioned keys can be securely stored in the local memory.

For the *PEKS* algorithm, a DS gets search counter of the DR, encrypts data that includes predefined keyword with the search counter, and uploads the encrypted data to the server (see step ② and ③). Then, the DR generates a search query, encrypts it with symmetric key *SKu*, and transfers it to enclave within the CS using the *Trapdoor* algorithm (step ④). The enclave has provisioned keys required for decryption and the search query from the DR. For the *Search* algorithm, the data record is loaded to the enclave (step ⑤). If the data record size is greater than the EPC, the record are separated into smaller pieces and partial records are loaded multiple times to the enclave. The enclave decrypts the search query using the symmetric key and searches the matching record (step ⑥). Finally, if there is a matched result, then the enclave returns the result to the DR (step ⑦).

**Figure 1.** High level design overview.

#### *3.2. Algorithms and Security Definitions*

SPEKS consists of four polynomial-time algorithms: *SPEKS* = (*Setup*(1*λ*, *λ*), *PEKS*(*PKu*, *w*, *u*, *F*), *Trapdoor*(*Ku*, *w*,*sc*[*u*]), *Search*(*SKu*, R, *tw*)).

**Definition 1.** *(SPEKS). A secure SPEKS is a tuple of four polynomial-time algorithms (Setup, PEKS, Trapdoor, Search) as follows:*


**Definition 2.** *(Correctness) Let* D *denote a SPEKS-scheme consisting of the four algorithms described in Definition 1. For any correctly generated public key pair* (*PK*, *SK*) *and symmetric K of the data receiver, and for any keyword w, Search*(*SKu*, R, *tw*) = 1 *holds with probability* 1*, where ciphertext* R ← *PEKS*(*PKu*, *w*, *u*, *F*) *and tw* ← *Trapdoor*(*Ku*, *w*,*sc*[*u*])*.*

We define our security model based on the three steps framework introduced in [16]. For the first step, we need to formulate a leakage which means an upper bound of the information that an adversary may gather from the protocol. Second step is defining the **Real**A(*λ*) and **Ideal**A,<sup>S</sup> (*λ*) games for an adaptive adversary A and a polynomial-time simulator S. **Real**A(*λ*) is the actual protocol and **Ideal**A,<sup>S</sup> (*λ*) is the simulated protocol for the real game by utilizing S using only the formulated leakage. Information learned in the previously executed protocols can be used by an adaptive adversary for its subsequent queries. Third step is proving that a scheme is CKA2-secure by showing that A can distinguish the outputs of the games with probability close to 0. When the probability is negligibly close to 0, A does not learn anything more than just the leakage stated in the first step.

Similar to the scheme introduced by Fuhry et al. [13], our scheme has an additional transaction between the cloud server and the trusted hardware, SGX. This additional transaction can be monitored by the adversary; therefore, we extended the original security model to hardware-security model. L*hw* denotes the leakage on the CKA2-HW-security.

**Definition 3.** *(CKA2-HW-security). Let* D *denote a SPEKS scheme consisting of the four algorithms described in Definition 1.* A *is a stateful passive adversary, and* S *is a stateful simulator that gets the leakage functions* L*PEKS and* L*hw. Two probabilistic experiments* **Real**A(*λ*) *and* **Ideal**A,<sup>S</sup> (*λ*) *are described as a follow.*


We claim that D is (L*PEKS*,L*hw*)-secure against adaptive chosen-keyword attacks if for any probabilistic, polynomial-time algorithms A, there exists a probabilistic, polynomial-time S such that:

$$\left| \Pr\left[ \mathsf{Real}\_{\mathcal{A}}(\lambda) = 1 \right] - \Pr\left[ \mathsf{Idead}\_{\mathcal{A},\mathcal{S}}(\lambda) = 1 \right] \right| \le \mathsf{negl}\left(\lambda\right).$$

#### **4. Construction**

#### *4.1. Cryptographic Primitive*

Let *EncPKE* and *DecPKE* refer to IND-CPA secure public key encryption and decryption algorithm respectively, and *EncSKE* and *DecSKE* refer to IND-CPA secure symmetric key encryption and decryption algorithm respectively. The proposed scheme is constructed based on these symmetric and public key encryption/decryption algorithms.

#### *4.2. Provisioning*

For key sharing between the enclave and the data receiver, the data receiver provisions his private key and symmetric key to the enclave. Since the keys should not be revealed to any untrusted entity, secure connection between the data receiver and the enclave should be established using the attestation feature of SGX. During the creation of the enclave, the key pair (*skE*) and (*pkE*) are created. The hardware random number generator (rdrand [17]) available in current CPUs can provide the sufficient randomness required for the key generation in practice.

Subsequently, the enclave sends the created *pkE* to the quoting enclave (QE). The QE creates the signature used for verification of the measurement of the initial memory content of the enclave *ME* and the public key *σQE*(*ME pkE*). With the given Intel's public key, the data receiver verifies the signature of *ME*, *pkE*, and *σQE*(*ME pkE*). The data receiver is now able to encrypt *SKu* and *Ku* with *pkE* and sends them back to the enclave. As a result, the enclave and the data receiver share the *SKu* and *Ku*, which they use for secure communication.

#### *4.3. Algorithms*

The proposed forward secure searchable encryption scheme with keyword search (SPEKS) consists of the following four algorithms: (Setup, PEKS, Trapdoor, Search).

Algorithm 1 gives a formal description of Setup of our SPEKS scheme. In (*PKu*, *SKu*, *K*) ← *Setup*(1*λ*, *λ*) algorithm, a data receiver (DR) generates *PKu*, *SKu*, and *Ku*. Next, the DR provisions the *SKu* and *Ku* to the enclave within the cloud server, *EnclaveCS*, through a secure channel. The secure channel can be established by the attestation feature provided by Intel SGX as explained in Section 4.2.

Next, Algorithm 2 provides a formal description of PEKS. In R ← *PEKS*(*PKu*, *w*, *u*, *F*), where F denotes a set of data, a data sender (DS) first requests search counter *sc*[*u*] from the cloud server (CS). Using the retrieved *sc*[*u*], the DS runs *EncPKE*(*PKu*,(*w*,*sc*[*u*])) and generates searchable ciphertext *ct*. A record R consists of three components (*d*, *ind*, *ct*), where *d* refers to the data, *ind* refers to the index, and *ct* refers to the searchable ciphertext. The generated record R is sent to the CS.

Algorithm 3 describes the Trapdoor algorithm of our scheme. In the algorithm, the DR creates a search token with the keyword. Specifically, in the execution of *tw* ← *Trapdoor*(*Ku*, *w*,*sc*[*u*]), the DR uses the search counter *sc*[*u*] and keyword, and generates the search token. Then, using symmetric key encryption (SKE), the DS encrypts the search token with symmetric key *K*. The encrypted search token *tw* is now transferred to Enclave*CS*. After transferring the search token, the DR increments his or her own search counter by 1.

Algorithm 4 describes the Search algorithm of our scheme. In Algorithm 4, Enclave*CS* checks whether search token *tw* matches R. Enclave*CS* runs *DecSKE* (*Ku*, *tw*) and retrieves keyword *w* and search counter *sc* . Next, using the key *SK*, keyword *w* and search counter *sc* are retrieved from the ciphertext *cti*. The *indi* from R is returned. *sc*[*u*] is then incremented by one, and *F* with returned *indi* is returned to the DR, when matched; else, ⊥ is returned .


```
Algorithm 2: R ← PEKS(PKu, w, u, F)
```
#### **DS:**

Request the search counter of *u* from the CS

#### **CS:**

Return *sc*[*u*] to DS **DS: for** *i* = 1 to |*F*| **do** *cti* ← Enc*PKE*(*PKu*,(*w*,*sc*[*u*])) R←R∪ {(*di*, *indi*, *cti*)} **end for** Transfer R to CS **CS: for** *i* = 1 to |R| **do** *ED* ← *ED* ∪ {(*di*, *indi*, *cti*)} **end for**

## **Algorithm 3:** *tw* ← *Trapdoor*(*Ku*, *w*,*sc*[*u*]) **DR:** *τ<sup>w</sup>* ← (*w*,*sc*[*u*]) *tw* ← *EncSKE*(*Ku*, *τw*) Transfer *tw* to *EnclaveCS sc*[*u*] ← *sc*[*u*] + 1

### **Algorithm 4:** (*F*/ ⊥) ← *Search*(*SKu*, R, *tw*)

```
CS:
  EnclaveCS :
  (w
     ,sc
         ) ← DecSKE (Ku, tw)
  for i=1 to sc' do
    (w,sc) ← DecPKE (SKu, cti)
    if (w = w
               ) and (i = sc) then
       return indi
    end if
  end for
  sc[u] ← sc[u] + 1
  Return F to DR if match; else, ⊥
```
#### **5. Analysis**

#### *5.1. Security Analysis*

We will prove the security of our scheme by defining the leakage functions related to access pattern. Then, we will explain how our SPEKS scheme guarantees the forward privacy.

We define two leakage functions: L*PEKS* (*sc*) and L*hw* (*sc*, R). L*PEKS* (*sc*) function outputs a record R given the search counter *sc*, which consists of encrypted data, indices, and searchable ciphertexts. Given *sc* and R, L*hw* (*sc*, R) function outputs the access pattern P (*sc*, R).

The access pattern P (*sc*, R) contains information of search counter *sc* and records R, which are stored in the untrusted memory region of the server. When the data receiver requests *sc*, the server can see which value is being returned. R also leaks which record is being sent to enclave or the data receiver. In our analysis, we further utilize values access pattern Δ (*sc*, R) [13], which, in our analysis, describes the pointers to the result values that specifically points the record with index *ind*.

**Theorem 1.** *(Security). The SPEKS construction is* (L*PEKS*,L*hw*)*-secure.*

**Proof.** We consider a polynomial-time simulator S for which probabilistic, polynomial-time adversary A can distinguish between **Real**A(*λ*) and **Ideal**A,<sup>S</sup> (*λ*) with negligible probability.


The adversary A cannot distinguish access of **Real**A(*λ*) from simulated access due to the delivering of deterministic results. The results are consistent for each different requests made for the same keyword. Since Δ (*sc*, R) is explicit, the number of result pointers matches and the pointers are also consistent. The pointed values are indistinguishable, because those values are encrypted IND-CCA secure.

**Forward Privacy**. In order to guarantee forward privacy, the past search queries should not be directly associated with the updated files. In our SPEKS scheme, we use a search counter that is supposed to be updated after each search. The search queries are generated only with private key and the search counter. Since the ciphertexts are created with the current newly updated search counters, past queries generated with past search counter values cannot match with newly updated files. Therefore, forward privacy is guaranteed in our proposed scheme.

#### *5.2. Performance Analysis*

In this section, we analyze the performance of our scheme and provide a comparative analysis with previous forward secure public key encryption with keyword search (FS-PEKS) schemes such as Zeng at al.'s [5] and Kim et al.'s [12] schemes.

Our experiment is run on a system equipped with Intel(R) Core(TM) i7-9700K CPU at 3.60 GHz, 16G DDR4 RAM. 64-bit Ubuntu 18.04.4 LTS with enabled SGX is used as an operating system. Our scheme is implemented based on Intel's Software Guard Extension (SGX) Software Development Kit (SDK).

When considering the multi-user environment, it is important to evaluate the initial costs for key setup and key management. In order for general symmetric searchable encryption (SSE) schemes to

support a multi-user environment, they need to set up the key multiple times in proportion to the number of users. For instance, if we define *u* as the number of data senders and |*DH*| as the initial cost of Diffie–Hellman key exchange protocol between a data sender and a data receiver, then the initial cost for the key setup is

$$(initial\\_key\\_setup\\_cost) = \mathfrak{u}\cdot |DH|.$$

However, the proposed scheme does not require key exchange protocol for each data sender, thus the key setup cost remains constant.

Furthermore, since the data receiver also needs to manage each key corresponding to each data sender, storage overhead for storing the key is also increased in proportion to the number of data senders in the previous schemes. For the key management, SSE schemes require a data receiver to store all of the keys set up for data senders. If |*K*| refers to the size of a key, the overall storage overhead is

$$(storage \quad cost) = \mu \cdot |K|.$$

Whereas, our scheme does not require a data receiver to store multiple keys for each sender. Therefore, our scheme has constant storage overhead for key management regardless of the number of data senders, which shows high scalability of our scheme in the multi-sender environment.

*Computation overhead*: As shown in Figure 2, the proposed scheme has lower computation cost compared to those of Kim et al.'s and Zeng at al.'s schemes. Specifically, for *PEKS* algorithm, while Kim at al.'s scheme takes 3.958 ms and Zeng at al.'s takes 8.123 ms, our scheme takes just 0.0919 ms, which is significantly less overhead. Next, for generating a search token using *Trapdoor* algorithm, our scheme takes 0.02 ms, which is constant independently of the search count. However, in Kim et al.'s scheme [12], it takes 4.85 ms for a single search token, and the computational cost increases in proportion to the number of search counters. Zeng at al.'s scheme takes 12.11 ms for running *Trapdoor*, which is orders of magnitude slower than ours. In addition, for the *Search* algorithm, as shown in Table 1, it has the same complexity as our scheme in terms of the search time. However, when measuring the actual computation time in practice, the *Search* algorithm of ours takes, in average, 0.0436 ms, while the computation cost of search process in [5] depends on a set of encoded time period. This causes unnecessary computational overhead for some cases. The search algorithm in Kim et al.'s scheme takes 0.863 ms as shown in Figure 2.

This reduction in computation cost is caused by the characteristic of the trusted execution environment, especially Intel SGX in our scheme. Previous PEKS schemes [5,12] are constructed based on the pairing based cryptographic operations which leads to high computation overhead. For instance, in *Trapdoor* algorithm, our proposed scheme uses AES-GCM that is included in the SGX SDK for encryption and decryption of search query. Compared to pairing-based operations, AES-GCM is much more efficient cryptographic primitive. In addition, For *Search* algorithm, previous schemes utilize the pairing-based cryptography for searching over ciphertext. RSA or Elliptic Curve Cryptography cannot be used in such software-only based schemes, because the comparison over ciphertexts for search is not possible. However, the trusted execution environment provided by SGX enables the search process over plaintext while still guaranteeing the data privacy.

*Communication overhead*: As the previous revocation-based FS-PEKS scheme [4], most of the search counter-based schemes including Kim et al.'s scheme [12] generate multiple search tokens. To be specific, since public and private keys are revoked for each search phase, revocation-based FS-PEKS scheme needs to create multiple search tokens for previous ciphertext, and the size of the token depends on the number of searches made beforehand. Such an overhead becomes devastating as a number of searches are made subsequently. For instance, after 1000 searches, data receiver needs to generate 1000 search tokens. Moreover, the revocation-based approach leads to sending re-encrypted data, incurring an additional communication overhead. Likewise, Kim et al.'s scheme [12] generates a number of search tokens as the search counter increases, shown in Figure 3. Therefore, communication overhead related to the query size depends on the number of search counters. As shown in Table 1, the query size

is *O* (1) in our scheme because it requires only a constant number of search tokens regardless of the search counter. However, in Kim et al.'s scheme [12], the number of generated search tokens increases as the search counter increases. Zeng at al,'s scheme, on the other hand, does not adopt counter-based mechanism, but still creates multiple search tokens for the search operations. As shown in Table 1, communication overhead related to query size in Zeng et al.'s scheme depends on the encoded time period. However, unlike the other previous schemes, our scheme only generates a single token regardless of the number of search counters, thus the proposed scheme is more scalable in practice.

**Figure 2.** Computational cost of each algorithm.

**Figure 3.** Communication cost of each scheme.

Overall, the proposed scheme significantly outperforms previous FS-PEKS schemes and achieves better security by exploiting trusted execution environment, specifically Intel SGX in our scheme.

**Table 1.** Efficiency comparison of public key encryption with keyword search (PEKS) schemes. (|*sc*| refers to the value of search counter, *nd* refers to the number of data, |*id*| refers to size of identifier, |*S*| denotes number of searches made, |*T*| denotes a set of encoded time period.)


#### **6. Related Work**

In this section, we introduce the previous searchable encryption schemes and trusted execution environment (TEE).

#### *6.1. Searchable Encryption*

After the first searchable encryption (SE) was proposed by Song et al. [18], SE has been continuously studied to extend its functionality. Generally, the existing SE schemes can be classified into two types: searchable symmetric encryption (SSE) and public key encryption with keyword search (PEKS). By utilizing the symmetric key primitives [18–21], SSE schemes are generally more efficient than PEKS schemes. However, SSE schemes are not suitable for multiple data sender environments. PEKS schemes, based on the public key primitives [22,23], was first introduced by Boneh et al. [1], and it suits for multiple data sender environment due to the efficient key management. In PEKS schemes, generally, a data sender generates searchable ciphertext with a specific user's public key. Then, the data receiver creates a search queries and retrieves the data with secret key.

#### *6.2. TEE Based Implementations*

Fisch et al. [24] first introduced functional encryption scheme using Intel SGX and formally defined the security model. Since the first adoption of Intel SGX, many studies have been made to construct encryption schemes on Intel SGX platform. Fuhry et al. [13] used Intel SGX to design HardIDX, which is an encrypted database index. The functionality of search operation is implemented inside the enclave, but does not support the update operation. In addition, Zerotrace [25] proposed generic efficient ORAM primitives using Intel SGX, and Oblix [26] was designed for oblivious search. In Oblix, update process is designed to minimize the leakage of access pattern and result size of searches. Harnessing TEE such as Intel SGX as a building block for SE scheme construction is an effective way to increase efficiency and security of the schemes in practice.

#### **7. Conclusions**

In this paper, we proposed a public key encryption with keyword search scheme guaranteeing forward privacy using Intel SGX. We formally defined a security model for the proposed TEE-based scheme. Compared with the previous schemes, our scheme shows significantly higher efficiency because the proposed scheme generates a single search token regardless of conditions; while the previous schemes require multiple search tokens. Furthermore, the proposed scheme requires significantly less computation time for creating indices, generating search tokens, and searching processes.

Our scheme considers only a multi-sender environment. Extending our scheme to the multi-receiver environment is another important and challenging issue. In addition, preserving resilience against de-synchronization attack is also an important open problem in most of the cryptographic protocols or algorithms based on shared secret information such as IV (initial vector) or counter information. Since most of the forward secure PEKS schemes are constructed based on counter values, how to make an efficient countermeasure against the de-synchronization attack over the shared counter value is also a challenging topic in the PEKS literature as an important future work.

**Author Contributions:** Conceptualization, H.Y. and C.H.; methodology, H.Y. and J.H.; validation, H.Y., J.H. and W.L.; formal analysis, H.Y. and C.H.; investigation, H.Y., S.M., and Y.K.; data curation, S.M. and Y.K.; writing—original draft preparation, H.Y.; writing—review and editing, H.Y., C.H., and J.H.; visualization, H.Y., S.M., and Y.K.; supervision, J.H. and W.L.; project administration, H.Y., J.H. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported as part of Military Crypto Research Center (UD170109ED) funded by Defense Acquisition Program Administration (DAPA) and Agency for Defense Development (ADD).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**

The following abbreviations are used in this manuscript:


#### **References**


**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **A Taxonomy for Security Flaws in Event-Based Systems**

**Youn Kyu Lee <sup>1</sup> and Dohoon Kim 2,\***


Received: 1 September 2020; Accepted: 14 October 2020; Published: 20 October 2020

**Abstract:** Event-based system (EBS) is prevalent in various systems including mobile cyber physical systems (MCPSs), Internet of Things (IoT) applications, mobile applications, and web applications, because of its particular communication model that uses implicit invocation and concurrency between components. However, an EBS's non-determinism in event processing can introduce inherent security vulnerabilities into the system. Multiple types of attacks can incapacitate and damage a target EBS by exploiting this event-based communication model. To minimize the risk of security threats in EBSs, security efforts are required by determining the types of security flaws in the system, the relationship between the flaws, and feasible techniques for dealing with each flaw. However, existing security flaw taxonomies do not appropriately reflect the security issues that originate from an EBS's characteristics. In this paper, we introduce a new taxonomy that defines and classifies the particular types of inherent security flaws in an EBS, which can serve as a basis for resolving its specific security problems. We also correlate our taxonomy with security attacks that can exploit each flaw and identify existing solutions that can be applied to preventing such attacks. We demonstrate that our taxonomy handles particular aspects of EBSs not covered by existing taxonomies.

**Keywords:** security taxonomies; event-based systems; mobile cyber physical systems; security flaws

#### **1. Introduction**

Event-based systems (EBSs) developed by using message-oriented middleware (MOM) platforms [1] have been widely used in mobile cyber physical systems (MCPSs) as well as a wide range of applications including Internet of Things (IoT) [2–5], financial markets, logistics, and web apps [6], including those that directly interfaced with users (e.g., Android apps [7]). In the case of MCPSs, for example, since they integrate distributed entities including computational, communication, and physical components [8], event-based architecture has been considered as an appropriate mechanism for their implementation [8–11]. MCPSs' inherent heterogeneity and integration of multiple processes make event-based architecture as a relevant approach for their modeling and application [12–15]. Specifically, EBSs are highly scalable, easily evolvable, and have a low coupling that makes them especially suitable for highly heterogeneous distributed systems [16–21].

EBSs' popular attributes are led by their communication model. For example, in EBSs, components (interchangeably referred to as "event-clients" or "event-agents") invoke each other implicitly by publishing event messages (simply referred to as "events") instead of directly calling other components via explicit references. Accordingly, the components may not know the consumers of the events they publish, and may not necessarily know the producers of the events they consume as well. Although this communication mechanism provides several advantages, as its operation is based on non-determinism in event processing, it exposes EBSs to security threats such as event spoofing,

interception, and eavesdropping [22–24] (called event attacks). To minimize the risk of such threats on EBSs, security efforts are required.

When working on software security efforts, developers or administrators are required to determine the types of security flaws that exist in the system, the relative importance of each flaw, and the types of techniques that can be employed to handle each flaw. A security flaw taxonomy (an ordered system that indicates the natural relationships of security flaws [25,26]) can provide a basis for developers to make better decisions in securing their target software system. For the past three decades, many such lists and taxonomies of security problems have been studied [25–38]. However, despite the prevalence of EBSs, systematic identification and classification of EBSs' security flaws have not been extensively studied yet. Existing security flaw taxonomies do not adequately reflect the security issues that originate from the EBSs' characteristics or have been found in recent types of EBSs such as Android (Android is a mobile operating system (OS), but it also has been studied as a particular type of EBS because it supports event-based communication model. In this research, we consider Android not only as an OS, but also as a software system encompassing from middleware to applications. We will discuss the details in Section 2.2). Because EBSs have particular attributes that general software systems do not bear (e.g., implicit invocation in event communication), the existing lists or taxonomies are not directly applicable for securing EBSs. Therefore, it is inherently necessary to first systematically identify and classify EBSs' fundamental security flaws to negate any vulnerabilities in the system.

In this paper, we introduce a new taxonomy that classifies the security flaws within EBSs [22,39–47]. Built upon previously identified security flaws present in general software systems [25], our taxonomy classifies particular types of inherent flaws in EBSs, and is distinguished from the existing taxonomies because (1) it clarifies and classifies the inherently present security flaws in EBSs, (2) it covers all types of security flaws in the EBSs domain that have been identified so far, and (3) it considers different types of EBSs configurations (e.g., commercial or open-source MOM platforms). We also correlate our taxonomy with security attacks that can exploit these flaws and existing solutions that are applicable to preventing corresponding attacks. We evaluated our taxonomy in terms of its coverage by comparing it with the existing security flaw taxonomies. Our taxonomy covers all types of security flaws discovered in EBSs so far and even handles additional security flaws not covered by existing taxonomies.

The remainder of paper is structured as follows: Section 2 illustrates the background and definitions, Section 3 describes the methodology that we followed and the resulting taxonomy, Section 4 describes its evaluation, and Section 6 presents the conclusions.

#### **2. Background**

In this section, we clarify the underlying concepts and terminology that we will use later to describe our taxonomy in Section 3. We first provide the definitions of key concepts that our taxonomy uses. We then introduce the fundamental mechanism of EBSs and the different types of event attacks.

#### *2.1. Key Concepts*

In this paper, our use of the terms "flaw", "vulnerability", and "attack" are based on the terms defined in the existing literature [25,26]. A flaw is a defect of a software system, which can result in a security violation [25]. A vulnerability is caused by at least one flaw and can be exploited by attacks. An attack refers to the techniques that an attacker uses for attempting to detect and exploit a vulnerability. Attack or vulnerability taxonomies might be useful when developers (or administrators or testers) need to clarify the ways their target system can be attacked and the parts of the system that should be protected. However, considering the fact that a flaw is the root cause of security violations and can be masked by another part of the system, its identification is more useful for making a target system robust to security threats. Hence, in this paper, we focus on flaws, rather than attacks or vulnerabilities.

#### *2.2. Event-Based Systems*

The EBSs' popular attributes (e.g., scalability, evolvability, and low coupling [16–20]) are fundamentally enabled by their communication model. In EBSs, the components (i.e., the units of computation and data) communicate asynchronously with each other by using messages [48]. A message typically describes one or more observed events. An event is any occurrence that can be observed by a component (e.g., a change of the component's state or a change in the environment of the system) [49]. An event and its corresponding message are often conflated in literature for convenience. In this paper, the term "event" will be used to refer to these concepts broadly. A connector is an architectural element tasked with effecting and regulating interactions among the components [1]. Although there exist several connector types, in this paper, a connector will always refer to an event-based distribution connector [1] that distributes events to associated components. We will use the term "event broker" to refer to this concept broadly.

In EBSs, the components do not have explicit references to each other and are only able to invoke an event broker directly [49]. Consequently, the addition, removal, and updating of components can be achieved relatively easily during runtime [50]. A component can be an event producer or a consumer, or both. Communication between the components is processed via "source" and "sink" [51]—a source is an event interface invoked to publish events by a producer component; a sink is an event interface that an event broker invokes to transfer an event to a consumer component. When a producer publishes an event, the event broker routes the event to the appropriate consumers based on the system configuration, along with the routing and filtering policies [49]. When the event broker transfers an event to a component's sink, the component consumes the event. Each sink declares an event type and only allows the processing of events that match its declared type. In this paper, we will target the following three event types commonly used in today's EBSs [48]: (1) nominal, (2) subject-based and (3) attribute-based. Nominal event types are explicitly declared in a system's programming language and subsequently enforced at compile-time. In subject-based event typing, each event type is defined through a string value that captures an event's name. Similar subject-based event types can be organized into naming hierarchies (e.g., Weather/Country/City). In attribute-based event typing, an event type is defined through a set of attributes, where each attribute is a pair of name and value. Event types can be further defined into more specific event subtypes.

#### *2.3. Event Attacks*

Event attacks represent the security problems caused by non-determinism in an EBS's event processing encountered by developers and end-users. Event attacks abuse, incapacitate, and damage the system by exploiting event-based communication. Different types of event attacks have been identified throughout various domains, such as mobile and web apps [22,24,47,52–61]. The research to date has identified the following types of event attacks: *Spoofing* (A1): A malicious component can send an event that spoofs a target component to exploit the target's functionality/data [22]; *Interception* (A2): A malicious component can intercept an event that is supposed to be sent to other components and can send back inappropriate replies to make a target component malfunction or to exploit the target's functionality/data [22,24]; *Eavesdropping* (A3): A malicious component can eavesdrop on an event, which contains sensitive data, and is supposed to be open only for particular components [24,60]; *Confused deputy* (A4): A malicious component can indirectly access a target component, by accessing another component that has access to the target component, to exploit the target's functionality/data [47]; *Collusion* (A5): Two or more malicious components can collude by exchanging events to exploit the functionalities or resources of a target component [47]; *Flooding* (A6): A malicious component can send an overwhelming amount of events that makes a target component malfunction [55]; *Delaying* (A7): A malicious component (or event broker) can intentionally delay a series of event interactions to make a target component malfunction [54]. We have formally defined each type of event attack as listed in Table 1.


**Table 1.** Types of Event Attacks.

: a set of components, : a victim component, : a malicious component, : sensitive functionality, : sensitive information, : an event, −→ : an event communication channel for sending an event from to , ==⇒ : an event sent from to .

As event attacks are administered in the same manner as ordinary event exchanges and the malicious components disguise themselves as benign, it is difficult to identify and block event attacks. Preventing event attacks becomes more challenging especially when it is not possible to predict which component will compromise the system (e.g., as in the case in Android and J2EE apps). For example, in Android systems, depending on the apps installed according to different users' preferences, the components comprising the system would be different. In such a case, as it is hard to guarantee that all components in the system are benign or safe from security threats, existing techniques that require pre-defined access distribution (e.g., role-based access control [39]) cannot be used to prevent event attacks. Although the Android system was designed to enforce permission-based access control [7], some types of event attacks can bypass the permission checks (i.e., confused deputy and collusion [47,53,61]). Putting a strict limitation on event communication may address some of these security threats, but it can reduce the flexibility of and hamper the benefits of the EBSs. Although developers are required to follow security policies while building a system, they tend to lack attention and make mistakes [62]. Practice has also shown that developers are often completely unaware of potential threats or underestimate the framework's capabilities, thus placing the responsibility on the end-users to protect themselves while using the system [63].

#### **3. Taxonomy**

#### *3.1. Literature Review Methodology*

To analyze the security flaws in the EBS domain, we inspected the results of 84 literatures published in reputable journals and conferences [22,24,39–47,52,56,60,61,64–132]. We carefully followed the general guidelines for a systematic literature review process [133]. Specifically, we formulated our taxonomy by performing a content analysis over a set of literatures. The literatures were initially collected by using reliable literature search engines, such as IEEE Explore, ACM Digital Library, Springer Link, and Google Scholar. As shown in Table 2, our search query was formed as a conjunction of the domain keywords (i.e., "distributed event-based systems", "event-based systems", "android intent", and "android event") and attribute keywords (i.e., "security vulnerability", "security attack", "security flaw", and "security error"). Specifically, our search query was defined as the following formula: ∀ ∈ ∧ ∀ ∈ , where is the set of domain keywords and is the

set of attribute keywords as specified in Table 2. Note that, to cover a larger number of literatures, synonyms were considered for the attribute keywords during the search process. For example, regarding "vulnerability", we also considered similar keywords such as "flaw" and "error". Because the scope of search for Android keywords is too large, in order to effectively collect the Android literature dealing with the characteristics of EBSs, we used "android event" and "android intent" as domain keywords. The selected keywords were applied to the search for the literatures' titles, abstracts, and tags. To exclude outdated literatures, we limited the scope of the search to literatures published from 2000 to 2020. Although the majority of the literatures regarding EBSs were almost a decade old, we decided to keep them if they had appeared in top-tier conferences or journals with significant contributions (H5-index ≥ 20 or citation counts ≥ 50). Table 2 shows the number of initially searched literatures (IEEE Explore = 104, ACM Digital Library = 624, Springer Link = 1188, Google Scholar = 3078, Total = 4994) processed by keyword-based search over the aforementioned databases. After the initial searching, because the search engines in each database may have processed our queries differently, we performed a consistent keyword validation on the searched literatures based on the same keywords (1st filtering = 2018). After the first filtering, as not all the searched literatures fit within the scope of this research, we performed a brief review based on the title and abstract of each literature (2nd filtering = 780). Our review criteria included whether they handled security issues in EBSs. After the second filtering, we performed a detailed review on the filtered literatures by inspecting if they fit within the scope of this research. Finally, 84 literatures were selected as the base ingredient for our taxonomy.


**Table 2.** Number of Collected References during Literature Review Process.

#### *3.2. Taxonomy Construction Methodology*

Although EBSs have particular attributes that general software systems do not bear, they may still inherit security issues from them. Hence, we decided to build a taxonomy upon existing taxonomies that targeted general software systems.

First, we targeted the taxonomies that classify software security flaws. The advantage of this type of taxonomy lies in the convenience of creating a common language for sharing security flaws, allowing an efficient organization of security flaws across information sources, and ultimately identifying strategies to remedy security problems, which is the final goal of this research. For example, depending on the type of flaw, developers can figure out applicable solutions from among the existing ones and also for flaws that lack appropriate solutions. According to the review of security flaw taxonomies [37,38,134], the outdated taxonomies (i.e., before the year 2000) tend to be less elaborate than recently published ones [26,33] or some of them have been adapted to the latest ones [25,27,29,37,38]. Thus, among the selected taxonomies, we filtered out the taxonomies published before the year 2000. The taxonomies that only focused on implementation-level errors were also excluded to consider design-oriented security flaws.

Consequently, from among the remaining set of candidates, "software security flaw taxonomy" by Weber et al. [25] was selected as the starting point to create a taxonomy, because it has been

designed to adequately reflect the nature of security issues in an EBS. Weber's taxonomy classifies the flaws based on genesis (i.e., how they were introduced to the system). Specifically, this taxonomy is distinguished from others due to its major division between "intentional" and "inadvertent" flaws, which is pertinent to classifying security flaws in EBS. As an EBS generally provides an extensible infrastructure, unintended external source code can be included in the system, which implies that a developer's intention is an important determinant for classifying an EBS's security flaws. For example, although the Android framework was not originally designed to contain security flaws, if an Android app, intentionally designed as malicious, is installed on the system, the system will contain "intentional" security flaws. We adapted Weber's taxonomy based on 84 selected literatures on security issues in EBSs [39–44,64–77,127] as well as on Android security issues that originated from its event-based communication [22,24,45–47,52,56,60,61,78–126,128–132]. From those publications, we first extracted the security flaws each approach tries to address or introduce as an example. Then we clustered the flaws based on the similarity of ways they can be exploited. Finally, we examined if any of those flaws is related to its counterpart in Weber's taxonomy. The detailed process is as follows:

According to the existing research [40,45,79,87,105,118,121], an EBS may contain malicious code that allows different types of external access, such as a piece of code directed to unsafe URL. These types of flaws belong in the same category as "Trapdoor" in Weber's taxonomy. Prior research has defined and introduced a particular concurrency problem that only exists in EBSs, referred to as event anomalies [81,127,128]. Weber's taxonomy does define "Concurrency" flaws, but only includes time-of-check to time-of-use (TOCTTOU) errors; therefore, we expanded the scope of their characteristics and changed the name of the category to "Inadequate Concurrency" to present a more precise definition. The existing approaches indicated that the components in an EBS may communicate via covert (i.e., non-system-standard) communication channels [47,100,119]. Although some types of "Covert Channels" flaws were defined in Weber's taxonomy, we extended them to include newly identified covert channels such as the battery and vibrator in mobile devices. Authentication issues were also identified in EBSs, in the form of permission grant and authentication in a multi-domain EBS—a particular type of EBS comprising multiple event-brokers from different domains [65,80,86,90,108,118–120,135]. We extended the "Inadequate Authentication" category in Weber's taxonomy to include those authentication-related flaws. From Android apps, new types of resource leaks such as resource leaks via wifi and SQLite database were introduced [40,45,88,104,106,114,115,126,129,132], which can be added to "Resource Leak" in Weber's taxonomy. We changed the name of the category "Inadequate Resource Management" to define the scope more broadly. We also found that the flaws that existing approaches try to resolve fall under "Logic/Time Bomb" in Weber's taxonomy [56,103,115]. The existing EBS research introduced the knowledge of flaws where multiple components collude to exploit the system [45,47,61,88,100,104,106,109,117,131]. Moreover, the majority of security attacks in EBS are basically caused by its extensible event communication channels [22,24,39–41,45,47,52,60,68,70,71, 77,94,97,99,130,135]. As Weber's taxonomy does not include them, we extended the definitions of "Conspirator" and "Open Event Channels," respectively. We also added "Unsafe Events" and "Unsafe Event Interface" for including cases where those open event channels are unintentionally introduced to the system [22,24,39,41,68–70,75,77,79,84,85,87,95,102,110,123,130]. Note that, to guarantee the completeness of taxonomy, all the flaws extracted from the existing publications were incorporated in the new taxonomy. However, drawing from the flaws in the Weber's taxonomy, we excluded those that were not introduced in the existing literatures under review to build a taxonomy specialized for EBSs.

#### *3.3. Taxonomy*

The security flaw taxonomy for EBSs is shown in Figure 1. As an EBS is a particular type of software system, it incorporates some flaws from general software systems. Note that the boxes highlighted in *red* (F1, F4, F6, F9, F10) indicate the flaws adapted from the existing ones [25] to better reflect the system's event-based characteristics, and the boxes highlighted in *blue* (F2, F5, F7, F8) indicate the flaws we added because they are specifically caused by event-based communication. Finally, the *green* box (F3) indicates a flaw whose definition remains unchanged from the existing one [25]. In particular, the dashed boxes (F2, F5, F6, F7, F8, F10) indicate the flaws that can be exploited by event attacks. It is important to note that every flaw in this taxonomy was validated by existing publications regarding the security of EBS and Android [22,24,39–47,52,56,60,61,64–132] In this taxonomy, a software system is defined as a combined system that comprises both application-level and framework-level elements (i.e., middleware) where an operating system is considered as a sub-component of the system. As the taxonomy considers both the design and implementation-level flaws, we will use "developer" as a term that represents both system designer and programmer. Moreover, a component is defined as an architectural unit that can communicate with other components using system-defined events.

**Figure 1.** Security Flaw Taxonomy for event-based system (EBS). The *Red* boxes indicate the flaws adapted from the original taxonomy. The *Blue* boxes indicate the newly added flaws. The *Green* box indicates the flaw unchanged from the original taxonomy. The circled labels indicate the assigned number for each flaw.

The goal of this classification is to provide a basis for determining the appropriate security strategies to be used in a particular context. The taxonomy is first classified according to the developer's intention (*Intentional* and *Inadvertent*) because different security strategies can be used to reduce each type of flaw. For example, in a target EBS, if most of the security flaws are unintentionally and inadvertently introduced, exhaustive source code reviews and testing can be utilized to reduce the flaws [26]. However, in case most of the security flaws are intentionally introduced to an EBS, it would be more effective to minimize the proportion of externally-developed source code in the system by restricting the external components access (e.g., restrictive installation of third-party apps on Android system) or by incorporating more trustable message oriented middleware (MOM) platforms.

*Intentional* flaws are classified as *Malicious* and *Non-Malicious*. The *Malicious* flaws indicate the flaws that were deliberately inserted. If any part of the system was incorporated from an unreliable source, it might intentionally contain the following flaws:


*Non-Malicious* flaws are the side-effects of features that were deliberately added to the system. These flaws are not recognized by developers in general, but we categorize them as intentional because they were designed into the system by essential system requirements. For example, functional requirements created without considering security requirements can lead to these flaws.


*Inadvertent* flaws indicate software bugs. Although they can be detected and removed through testing, some flaws may remain undetected and later cause problems during the operation and maintenance stages of the system. Inadvertent flaws are classified based on the parts where the flaws reside. *Event-Based Communication* flaws represent the flaws that can be caused by the design or implementation of a system's event-based communication.

• F6. *Inadequate Concurrency* [22,81,127,128]: A particular form of concurrency flaw exists in EBSs, called event anomalies [81]. In general, EBSs' components randomly process the events that were received simultaneously. Specifically, if two different components simultaneously send the events that can access the same memory location (e.g., a variable containing state or data) of the target component, there is no guarantee that any one of the two events will be processed prior to the other. This flaw may allow spoofed events sent from malicious components to corrupt the victim component's memory location [81].


*System Configuration* flaws are the ones that can be caused by a system's defective configurations or deployments.


The remaining flaw in *green* box indicates a flaw inherited from Weber's taxonomy [25]: *Logic/Time Bomb* [56,103,115] flaw indicates a piece of source code designed to disrupt the system when certain conditions are satisfied.

#### *3.4. Relationship between Security Flaws and Event Attacks*

The identified security flaws in EBSs can be exploited by different types of attacks including event attacks. To effectively counter each type of event attack, we identified the relationship between the flaws and the event attacks. Then we examined existing solutions that have been proposed to protect the flaws from event attacks. In this section, we demonstrate the relationship between the flaws and event attacks, and assess existing solutions for resolving those attacks.

As discussed in Section 2.3, event attacks represent the security problems faced by developers or end-users due to an EBS's non-determinism in event processing. Recall the seven types of event attacks: *Spoofing* (A1), *Interception* (A2), *Eavesdropping* (A3), *Confused deputy* (A4), *Collusion* (A5), *Flooding* (A6), and *Delaying* (A7).

Each security flaw in an EBS can be exploited by different types of event attacks as depicted in Table 3. To protect each type of security flaw from event attacks, various solutions have been studied across different EBS platforms (e.g., OASIS [77] and Android [141]). Table 3 also presents the representative solutions that prevent event attacks from exploiting each type of security flaw.


**Table 3.** EBS Security Flaws, Event Attacks, and Existing Solutions.

As indicated in Table 3, neither security flaw F1 nor F4 are the targets of event attacks. They can be resolved by general security solutions such as a signature-based detection [147–149] or identification of covert channels [47]. Flaw F2 can be exploited by the attack A5, but the threat can be minimized by detecting sensitive information flows between the components [46,60,142] or controlling unsafe event communication between components [47,53]. Flaw F5 can be exploited by multiple types of event attacks (A1-7). Existing research has tried to minimize the threat using encryption of events, but it requires safe key distribution between the components and additional resources that may become a burden for an environment with limited resources (e.g., mobile devices) [41]. While enforcement of security policies [46,47,71,143] has also been proposed, a coarse-grained policy may fail to prevent event attacks. For flaw F6, which is vulnerable to the attack A1, a static analysis for event anomalies detection [81,127,128] can help developers identify and fix the flaw. Flaw F7 can be a target for the attacks A2, A3, and A7. Although role-based access control and encryption of events [39,41,135] may prevent the attacks, those techniques require certain assumptions about the components engaged in event-based interactions, namely, they assume that "benign" components will be known. In other words, these approaches cannot properly deal with event-related security threats when the types of components are not clearly delineated and a malicious component can behave as a legitimate component. Though existing research has focused on the detection of attacks A2 and A3 in Android apps [22,45,46,142], they either target limited types of attacks or do not provide actual prevention mechanisms. Flaw F8, which is vulnerable to the attacks A1, A4, and A6, can be resolved by the same solutions that are applicable for flaw F7. Flaw F9 is exposed to all types of event attacks, because the possibility of a malicious component's existence in a system can be increased if the system's authentication mechanism is not well-defined. This threat can be minimized by validating a system's security policies [39,144]. Flaw F10, which is vulnerable to the attacks A6 and A7, can be resolved by analyzing and monitoring a system's runtime event interactions or resource usages [145,146].

Overall, existing solutions belong to prevention- or detection-type and each type has its limitations. As the prevention-type solutions are based on the assumption that the types of components are clearly delineated, they can be coarse-grained in case it is unclear how to pinpoint the benign components. Although detection-type solutions provide relatively finer-grained results for identifying the flaws

vulnerable to event attacks, they suffer from inaccuracy and scalability issues in their analysis. To further secure EBSs, advanced approaches that combine detecting flaws and preventing attacks are required.

#### **4. Evaluation**

To validate our taxonomy in terms of coverage, two different types of evaluation were required: (1) completeness: if it covers all types of security flaws in EBSs; and (2) originality: if it handles particular types of security flaws not covered by existing listings or taxonomies.

Regarding the completeness of our taxonomy, as mentioned in Section 3.2, all types of flaws extracted from existing publications were incorporated in our taxonomy. We carefully collected 80 existing publications dealing with security issues in EBSs as well as Android security issues that originated from its event-based communication feature. We then derived different types of security flaws from those literature and classified them, which guarantees that our taxonomy covers all types of security flaws identified in the EBS domain so far.

To evaluate the originality of our taxonomy, we performed an analytic comparison with existing listings and taxonomies for security flaws. Among a number of studies for classifying security issues, we targeted the most cited or recently published taxonomies. To the best of our knowledge, four existing works share our taxonomy's goal of classifying security issues—Weber's [25], OWASP [36], Tsipneuk's [29], and Linares-Vásquez's [35]. The first three taxonomies mainly target general software systems and the last one targets the Android system. Considering the fact that Android is widely used and is a particular type of EBS, we included Linares-Vásquez's taxonomy in this evaluation. Although the selected taxonomies target different types of security issues (i.e., risks, errors, and vulnerabilities), they also serve the same purpose as our taxonomy in that they classify the cause of the security violations. We analyzed if each type of security issue in the selected taxonomies can be mapped to any flaw type in our taxonomy in terms of its definition. If the definitions of any two types were identical, we classified them as "*completely mapped*," and if they were partially matched in broad terms, then as"*partially mapped*." As each taxonomy has different levels in its classification, we correlated the security issues regardless of the levels of classification.

As mentioned in Section 3.2, out of 16 flaws in Weber's taxonomy [25], we adapted five in terms of their definition and added four related to event-based communication. We excluded ten flaws that mainly focused on implementation-level security issues in general software systems (e.g., aliasing and error handling).

Compared with the Open Web Application Security Project (OWASP) Top Ten 2017 [36], which is a list of the 10 most critical web application security risks, three risk types can be mapped to the flaws in our taxonomy (see Table 4). Specifically, "*Injection*" in the OWASP list can be partially mapped to the flaws F1 and F8 in our taxonomy. It represents an exploitation of a victim to perform unintended behaviors, which can be implemented via flaws F1 and F8. In a broad sense, "*Sensitive data exposure*" in the OWASP list can be partially mapped to the flaw F7, because an unsafe event may expose sensitive data. To be more exact, however, the flaws F7 and F8 are more specific to event-based communication. The remaining seven types of risks in the OWASP list such as "*Cross site scripting*" and "*Insecure deserialization*" are more focused on the inherent characteristics of web applications.


**Table 4.** Correlation with Existing Security Flaw Taxonomies.

◦ : partially mapped, • : completely mapped

Tsipenyuk's taxonomy [29] handles implementation-level errors that affect a system's security. It classifies seven main categories and 76 underlying errors. Among those errors, three types can be mapped to the flaws in our taxonomy (see Table 4). Specifically, both "*Command injection*" and "*Process control*" can be partially mapped to the flaws F1 and F8 in our taxonomy. They also represent the exploitation of a victim to perform unintended behaviors, which can be implemented via flaws F1 and F8. "*Unreleased resource*" can be partially mapped to the flaw F10 in our taxonomy. It represents a system's failure to release system resources, which can be caused by inadequate resource allocation. However, none of these error types consider the inherent characteristics of EBSs, such as event-based communication. The remaining 73 types of errors in Tsipenyuk's taxonomy do not correlate with the flaws in our taxonomy.

Linares-Vásquez's taxonomy [35] targets security vulnerabilities in Android, and classifies 15 main categories with 126 underlying vulnerabilities. Similar to the aforementioned taxonomies, both "*Code injection*" and "*Command injection*" in Linares-Vásquez's taxonomy can be partially mapped to the flaws F1 and F8 in our taxonomy. "*Resource management errors*" can be completely mapped to our flaw F10 in terms of its definition. Although "*Race condition*" in Linares-Vásquez's can be partially mapped to flaw F6, it does not consider event anomalies [81]. "*Missing encryption of sensitive data*" and "*Insufficient verification of data authenticity*" can be partially mapped to flaw F7 to consider an event containing sensitive information without any particular protection. The remaining 120 types of vulnerabilities in Linares-Vásquez's taxonomy are more focused on Android-specific security issues.

Overall, although existing taxonomies for security issues handle some of the flaws in our taxonomy, most of them are partially matched. Our taxonomy covers additional security flaws related to the inherent characteristics of EBSs, which are not covered by existing listings or taxonomies. However, it is important to note that existing taxonomies cover the flaws related to general software systems that are not the focus on our taxonomy.

#### **5. Discussion**

In this paper, we analyzed security flaw patterns and trends in the existing literature, and underlined challenges that will shape the focus of future research. Our taxonomy can help engineers assessing security problems in EBSs they built. A finer-grained classification of the most common flaws or attacks is useful because system administrators need to anticipate what they will experience in their system. It also provides a baseline for collecting and organizing security-related data, and consequently the information can help engineers strengthen their EBSs. Furthermore, our taxonomy will be useful for security practitioners to organize the problem space. Security problems are caused by an unexpected combination of flaws in general. In these cases, finer-grained distinctions between security flaws can help define a specific problem space. Our taxonomy will be useful for researchers to develop and evaluate potential research directions. Despite significant research efforts

to mitigate the security threats in EBSs, solutions targeting these types of systems still lack. We believe that the results of our review (see Section 3.4) will help initiate the required research in this area.

In this research, we carefully followed the general guidelines for a systematic literature review (SLR) process in order to minimize the threats to validity. Nevertheless, there exist inherent threats that require further discussion. Our SLR process includes the utilization of search engine and keyword construction. To maximize the completeness of our taxonomy—whether all of the appropriate literature was included—, we adopted multiple search engines and employed an iterative approach for keyword construction. Furthermore, our SLR process inevitably relies on the interpretation of individual reviewers. To address any resulting bias, we additionally conducted the crosschecking of the literatures, such that no paper reviewed by a single reviewer. Although new variations of security flaws in EBSs can be encountered, to mitigate this threat, our taxonomy has adapted existing classification method which has proven to be rich enough to adequately classify the characteristics of security flaws. This implies that our taxonomy can be adapted to counter new types of security flaws in EBSs.

#### **6. Conclusions**

Event-based systems (EBSs) have become popular in mobile cyber physical systems, IoT applications, mobile applications, and web applications because of their inherent advantages. However, their reliance on non-determinism in event processing can be exploited by different types of attacks (e.g., event attacks). In the light of current interest in the security threats within EBSs, we developed a novel security flaw taxonomy for EBS. Each flaw is categorized based on the common factors present among flaws, enabling a systematic approach to resolving the security problems in an EBS. We showed the correlations between each flaw and different types of attacks as well as between each flaw and the applicable existing solutions for preventing the corresponding attacks. We also demonstrated that our taxonomy covers all types of security flaws identified in EBSs so far and even handles additional security flaws not covered by existing taxonomies.

Our taxonomy will help developers determine the types of security flaws existing in their target system and decide the appropriate techniques suitable to resolve each one. In addition, our taxonomy will shed light on potential research directions for securing EBSs.

**Author Contributions:** Y.K.L. was the main researcher who initiated and organized research reported in the paper, and all authors including Y.K.L. and D.K. were responsible for analyzing the literatures and writing the paper. D.K. performed detailed writing-review and validation. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MSIT) (No. 2020R1F1A1068774) and a research grant from Seoul Women's University (2020-0451).

**Acknowledgments:** Dohoon Kim is the corresponding author of this paper.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Reducing Dynamic Power Consumption in Mixed-Critical Real-Time Systems**

#### **Ijaz Ali 1, Yong-Il Jo 2, Seonah Lee 2,3, Wan Yeon Lee <sup>4</sup> and Kyong Hoon Kim 5,\***


Received: 8 September 2020; Accepted: 12 October 2020; Published: 16 October 2020

**Abstract:** In this paper, we study energy minimization consumption of a mixed criticality real-time system on uni-core. Our focus is on a new scheduling scheme to decrease the frequency level in order to conserve power. Since many systems are equipped with dynamic power and frequency level memory, power can be saved by decreasing the system frequency. In this paper, we provide new dynamic energy minimization consumption in mixed-criticality real-time systems. Recent research has been done on low-criticality mode for power reduction. Thus, the proposed scheme can reduce the energy both in high-criticality and low-criticality modes. The effectiveness of our proposed scheme in energy reduction is clearly shown through simulations results.

**Keywords:** mixed-criticality; power-aware; real-time scheduling; DVFS

#### **1. Introduction**

Real-time systems take some inputs and produce outputs in a time-bound manner. Meeting deadline is the core concept of a real-time system such that missing a deadline may collapse the whole system. A real-time system has fragile uses such as an airline command system, which is so highly critical that a single failure can cause a major explosion. Similarly, a real-time system is employed in satellite receivers for collecting highly important information and failures can misguide and result in a major collapse [1]. Daily home appliances such as microwave, AC, electric power system, and refrigerator, etc. can also employ a real-time system.

In a real-time system, the term mixed-critically means that high-critical tasks must meet their deadlines at the cost of missing deadlines for certain low-criticality tasks. Therefore mixed-criticality can be used as a tool for assuring the system failure needed for different components. In the literature, mixed-criticality is identified as mission-criticality and LO- (low-criticality) criticality. The mission-criticality (hard real-time) failures can cause major damage in the systems such as loss of flight control, receiving wrong information via radar system, and misguiding satellite data. On the other hand, LO-criticality (soft real-time) is relaxed critical and can be considered less destructive such that deadlines can be violated occasionally.

A mixed-criticality system (*MCS*) is characterized to execute in each of two modes, high and low critical mode [2]. Each task is described by the shortest arrival time of a task (period denoted by *P*), deadline (denoted by *D*), and Worst case execution time *WCET* one per criticality level, denoted by

(*Ci*(*LO*) and *Ci*(*H I*). The condition of the basic *MCS* model is the system beginning in the LO-criticality mode and can stay in that mode given all jobs execute within their low-criticality computation times (*Ci*(*LO*). If any job executes for its (*Ci*(*LO*) execution time without any signal, the system directly moves to high-criticality (HI)-criticality mode. In HI-criticality mode, LO-criticality jobs should not be executed but some level of service should be maintained if at all possible as LO-criticality tasks are still critical.

In this scheme Guan, Emberson, and Pedro [3–5] consider a simple protocol for mode switch situations for controlling the time of the change of mode back to low-criticality, which is to wait until the CPU is idle and then safely be made. Producing a somewhat more efficient scheme, Santy [6] extends this approach that can be applied to a globally scheduling multi-processor system in which the CPU may never get to an ideal tick. In a dual criticality level that has just shifted into a HI-criticality mode and hence no LO-criticality tasks are computed, its protocol is to first wait when the HI-criticality task has completed its high computation time and then wait for the next high priority task, and this continues until the lowest priority job is inactive and it is then safe to reintroduce all low-criticality jobs. If there is a further misbehavior of low computation bound the protocol drops all low-criticality jobs if any jobs compute more then its (*Ci*(*LO*) value.

Dynamic voltage and frequency scaling (DVFS) is a commonly-used technique for reducing the overall energy consumption, which is minimized in a large-scale data processing environment. This technique is based on utilizing two common parameters such as processor voltage and processor frequency to reduce power consumption. DVFS enable processor maximum power consumption, which can be accomplished by decreasing the operating frequency level of a processor. However, a scale-down of the processor's CPU frequency causes a delay in task completion time. Much of the literature has been focused on reducing power consumption in embedded systems. A similar technique, real-time dynamic voltage and frequency scaling (RT-DVFS), studied reducing power consumption for periodic and aperiodic tasks. In the RT-DVFS technique, slack time is used as a parameter for adjusting the processor speed such that tasks deadlines will be guaranteed.

In the proposed work, we scheduled a single-processor which support variable frequency and voltage scaling. Our aim is to schedule the given jobs that a CPU speeds all jobs achieved to meet its deadline and minimize energy. Few research has been done on minimizing the energy in a mixed-criticality (MC) real-time system, in [7] CPU acceleration is a deterioration algorithm that adds for given mixed-criticality aperiodic real-time tasks. They characterize an optimization issue of power consumption in MC real-time systems under extended frequency scaling. As the same time each job is performed under the derived frequency scaling. So we enhanced the dynamic approach where the frequency level accommodates under the derived frequency scaling for the plain power decline. The main grant in this research is that we reduced energy in HI-criticality mode dynamically.

#### **2. Related Work and Problem Description**

Initially, an MC system is considered by Vestal [8] for scheduling and since then it has gained increasing interest in real-time scheduling. S. Barauch and P. Ekberg consider [9] the mixed-criticality system in a way that all LO-criticality jobs are discarded when the system mode switches to HI-criticality [10–12]. In [13], they showed that the scheme of Vestal is optimal for fixed-priority scheduling systems. In [14], they provided response-time analysis of mixed-criticality tasks in order to increase the schedulability of fixed-priority tasks. In [10], they provided a heuristic scheduling algorithm based on Audsley priority assignment strategy for efficient scheduling.

Audsley approach [15] is used to assign priority from the lowest to highest level. At each priority level, the lowest priority job from the low criticality task set is tried first, if it is schedulable then the job moves up to the next priority level if it is schedulable, then the lowest search can be abandoned as the task set is unscheduled. In [16], they considered how these time-triggered tables can be produced via first simulation.

The energy-minimization consumption of a processor is generally classified into dynamic and static techniques in terms of the consideration of dynamic frequency adjustment. They are also classified into continuous or discrete frequency level schemes according to the assumption of frequency continuity. Yao et al. [17] and Aydin et al. [18] also proposed a static (or offline) scheduling method to reduce energy minimization in a real-time system, in this paper [19] Jejurikar and Gupta study the energy saving of a periodic real-time job. Gruian determined proposed stochastic data to derive a energy-efficient schedules scheme in [20]. In [21], they provided minimum power consumption in periodic task scheduling for discrete frequency-level systems. On the contrary, the dynamic scheduling scheme adjusts the CPU frequency or speed levels depending on the current system load in order to fully utilized the CPU slack time.

The Audsley scheme for assigning priority to mixed-criticality jobs is based on their criticality level in this paper [15], and priority is given to jobs manner high to low scheduling priorities so that priorities are given to lowest priorities task, the schedule difficulty of the MC real-time system is investigated by Baruah, the author proof when all jobs are released at the same time is when these jobs are set to NP-complete [9]. In this scheme, they investigated the optimal schedule algorithm for the MC system scheduling performing well in practice.

The own criticality base priority (OCBP) to MC sporadic jobs by Li and Baruah [22] considers criticality for priority assignment. When a new job arrives to the system, a new priority is assigned to the job. In [3], they presented a scheduling scheme known as priority-list reuse scheduling based on the OCBP scheduler. In [23], they assumed a likewise realistic energy model and presented an optimal static scheme for minimizing the energy of multi-component with adjusting individual frequencies main memory and processor system bus.

The connection between multiple-choice knap sack problem (MCKP) and dynamic voltage scaling (DVS) for periodic task and energy optimization was at first proven by Mejia-Alvarez and Mosse [24]. In this paper Aydin et al. consider [18] the dynamic voltage frequency scaling scheme for periodic jobs that complete before their worst-case execution times (WCETs). In [25], they proposed the elastic scheduling for the purpose of utilizing CPU with discrete frequencies. In [26], they presented a dynamic slack algorithm allocation for real time that consider both the loss energy minimization and frequency scaling overhead. The cycle conservation approach was proposed by Mei et al. [27]. They suggested a novel power aware scheduling scheme named cycle conservation DVFS for sporadic jobs. In this algorithm P.Pillai and K.G.Shin [28] proposed real-time DVS, the OS's real time scheduler, and jobs managing service to allocate minimum power consumption while maintaining that the deadlines must always be met.

More recently researches on a power-aware mixed-criticality real-time system have been presented by [7,29]. The major technique is used for a power-aware mixed-criticality system and they consider only a set job with no periodical jobs. They determine possible CPU speed degraded for MCS jobs. In this algorithm [29], they show that minimizing the energy of power-aware mixed-criticality real-time scheduling for periodic jobs under continuous frequency scaling. The early deadline first with the virtual deadline (EDF-VD) algorithm [11] provide the most favorable virtual deadline (VD) and frequency scaling of jobs, and do not adjust during run time the derived frequency levels of jobs. In [30], when high-critical jobs do not finish low computation time, all low-critical jobs are terminated and the system frequency level is set to maximum, in this paper they only reduce frequency in low-critical mode.

In our work we provide an efficient power-aware scheduling algorithm in MC real-time systems and adjust the optimal frequency level of high-criticality mode, to the best of our knowledge this is the first work that introduces optimal energy consumption of high-criticality mode in a mixed-criticality real-time system, the main grant our scheme is that we minimize energy in high-criticality mode dynamically and show the experimental results in simulations.

#### **3. System Model**

#### *3.1. Task Model*

In this subsection, we provide an overview of the task model. In the mixed-criticality real-time systems, a low-criticality periodic task releases an order of jobs only in low criticality mode, while high-criticality tasks release their jobs in both high- and low-criticality mode. Thus a mixed-criticality task *τ<sup>i</sup>* consists of four parameters: Period (*Pi*) , computation time of low-criticality jobs, *Ci*(*LO*), computation time of high-criticality jobs, *Ci*(*H I*), and tasks level (*Xi*) as follows:


The task *τ<sup>i</sup>* is a periodic real-time task, so that jobs are released at every *Pi* time units. The *j*-th instance or job of a task *τ<sup>i</sup>* is denoted as the *τi*,*j*. In the mixed-criticality system, tasks are categorized into low-criticality and high-criticality tasks. In addition, the system mode is also divided into low-criticality and high-criticality mode. In low-criticality mode, all tasks release their jobs so that each task's job *τ<sup>i</sup>* requires the worst-case execution time of *Ci*(*LO*). On the contrary, in high-criticality mode, only the high-criticality tasks release their jobs with *Ci*(*H I*) execution time (*Ci*(*H I*) ≤ *Ci*(*LO*)). Thus, each task has its criticality mode *Xi*.

The mixed-criticality system is an integrated suit of hardware, middleware service, operating system, and application software that support the execution of non-criticality, mission-criticality, and safety-critical functions. The system starts in low-criticality mode. However, if there is a possibility that any low-critical job interrupts in high-criticality jobs' execution time, then the system criticality mode changes. In such a situation, all low-criticality tasks are dropped in the system. In mixed-criticality systems, such a possibility occurs when a high-criticality job does not complete its computation time, which is the condition of switching from low-criticality mode to high-criticality mode.

On the contrary, the system returns to low-criticality mode when there is no possibility of overrun. While high-criticality tasks are executed in high-criticality mode, the system changes its criticality to low mode as long as there is no task ready in the queue [29].

For example, Figure 1 shows an example of three mixed-criticality tasks of *τ*1(2, 2, 5, LO), *τ*2(1, 3, 6, HI), and *τ*3(2, 3, 8, HI). The system starts in low-criticality mode, where each task requires *Ci*(*LO*) execution time. Each task releases its job every *Pi* time units. The scheduling algorithm used in Figure 1 is EDF (earliest deadline first).

**Figure 1.** An example of mixed-criticality scheduling.

Let us assume that the job *τ*3,3 does not complete its execution at time 19. Then, the system changes the criticality mode to high-criticality. After then, the system executes only high-criticality tasks (*τ*<sup>2</sup> and *τ*3) with their *Ci*(*H I*) execution times. The execution times of *τ*3,3 and *τ*2,4 become 3 in each. When the system is in high-criticality mode, all low-criticality jobs are ignored or removed from the queue. For instance, the job *τ*1,5 released at time 20 is removed from the scheduling queue since it is a low-criticality job.

The systems returns to low-criticality mode if there is no high-criticality jobs waiting in the scheduling queue. For example, the system returns back to the low-criticality mode at time 23 because there are no jobs available. After then, the system executes low-criticality jobs again as before.

#### *3.2. Power Model*

In this paper, we assume the DVFS-enabled CPU system where the CPU frequency is adjusted dynamically during run-time. The number of discrete frequency levels is given by *m* while the frequency levels are defined as a set *F*.

Let us assume that a task requires *t* execution time on the CPU at its maximum frequency level. For a given frequency level *f* of the CPU, the relative speed level *s* is defined by *f* / *f*max, where *f*max is the maximum frequency level. Then, the task execution time is defined by *t*/*s*.

Since the dynamic power consumption is a major issue in the power consumption of systems, we take dynamic power consumption into account in the paper. Generally, the dynamic power is in proportion to *f* <sup>3</sup> or *f* <sup>4</sup> for a frequency level *f* , we use Equation (1) for the execution time model of a task with *t* execution time on the relative speed level *s* [31].

$$E = \mathbf{a} \cdot \frac{t}{s} \cdot \mathbf{s}^3 = \mathbf{a} \cdot \mathbf{t} \cdot \mathbf{s}^2,\tag{1}$$

where *α* is a coefficient. In this paper we assume *α* = 1 for the sake of simplicity.

Figure 2 shows an DVFS scheme for real-time task scheduling. For example, a real-time task requires 3 time unit for its execution, while its result requires 10 time units (Figure 2a). If there is no other task, the system has 7 time-unit slack time to the task deadline. Thus, the task can be executed on the relative speed level of 0.3, as shown in Figure 2b. In the reduced CPU speed level, the system can reduce the power consumption without violating the task deadline.

**Figure 2.** Dynamic voltage scaling (DVS) for real-time tasks.

#### **4. Research Motivations**

#### *4.1. Recap of EDF-VD for Power-Aware Mixed-Criticality Real-Time Tasks*

In this subsection, we describe a brief explanation of the previous work on power-aware mixed-criticality tasks scheduling [29]. The base scheduling algorithm is early deadline first with the virtual deadline (EDF-VD) which is a mode-switched EDF scheduling technique developed for mixed-criticality task sets [22,32,33]. The reservation of time budgets for *H I* criticality tasks is done in the *LO* mode. This is achieved by shortening the deadline of *H I* criticality tasks. Intuitively, shortening the deadline of *H I* criticality tasks will push them to finish earlier in the *LO* mode, leaving more time until their actual deadlines to accommodate extra workloads. Indeed, this form of safety preparation (i.e., shortening deadlines of *H I* criticality tasks in the *LO* mode) has proven to be effective in improving system schedulability [34].

In EDF-VD, the value of *x* in a system determines the virtual deadline *VDi* as *Pi* · *x*, where 0 < *x* ≤ 1. In order to guarantee the schedulability of task sets both in LO mode and HI mode, the value of *x* should satisfy the two equations of Equations (2) and (3):

$$\frac{\|\mathcal{U}\_{LO}^{HI}\|}{\infty} + \mathcal{U}\_{LO}^{LO} \le 1 \tag{2}$$

$$\|\mathbf{U}\_{HI}^{HI} + \mathbf{x}\mathbf{U}\_{LO}^{LO}\| \le 1\tag{3}$$

In [29], EDF-VD is adjusted in order to provided power-awareness for mixed-criticality real-time systems. They defined a problem of power-aware scheduling in MC systems. The objective is to minimize power consumption satisfying both Equations (4) and (5):

$$\sum\_{\pi\_i \in T\_{HI}} \frac{\mathbb{C}\_i(LO) / f\_{LO}^{HI}}{P\_i} \cdot \frac{1}{\text{x}} + \sum\_{\pi\_i \in T\_{LO}} \frac{\mathbb{C}\_i(LO) / f\_{LO}^{LO}}{P\_i} \le 1 \tag{4}$$

$$\sum\_{\tau\_i \in T\_{HI}} \frac{\mathbb{C}\_i(HI)}{P\_i} + \mathbf{x} \cdot \sum\_{\tau\_i \in T\_{LO}} \frac{\mathbb{C}\_i(LO) / f\_{LO}^{LO}}{P\_i} \le 1 \tag{5}$$

where *TH I* and *TLO* are sets of high-criticality tasks and low-criticality tasks, in each. In Equations (4) and (5), *f LO LO* and *<sup>f</sup> H I LO* indicate optimal frequency levels of HI-criticality tasks and LO-criticality tasks in low mode. They provided an optimal solution to derive *x*, *f LO LO* , and *<sup>f</sup> H I LO* for the formulated problem.

For example, Table 1 shows an example of a task set. The optimal values of *x*, *f LO LO* , and *<sup>f</sup> H I LO* are given by 0.56, 0.6, and 0.8, respectively from the method in [29]. The right three columns of Table 1 shows the virtual deadline and the execution time in low-criticality mode. Figure 3 shows the scheduling example of Table 1 based on EDF-VD.

**Table 1.** An example of tasks.

**Figure 3.** An example of power-aware mixed-criticality Scheduling of Table 1.

As shown in Figure 3, high-criticality tasks, *τ*<sup>1</sup> and *τ*2, are run at a *f H I LO* frequency level in low-criticality mode, while low-criticality tasks of *τ*<sup>3</sup> and *τ*<sup>4</sup> run at *f LO LO* . Let us assume that *τ*2,3 does not complete *Ci*(*LO*) at time 17.25. Then, the system mode changes to high-criticality mode so that two low-criticality jobs of *τ*<sup>3</sup> and *τ*<sup>4</sup> are ignored after the mode switch event. In high-criticality mode, the frequency level is set as the maximum frequency in order to guarantee the schedulability of high-criticality tasks. The system mode returns back to low-criticality mode after executing all high-criticality jobs.

#### *4.2. Motivations*

As discussed in the previous subsection, the previous work focused on low-criticality mode. However, we can further reduce the power in high-criticality mode without violating the schedulability. For example, we can reduce the frequency level while executing *τ*2,3 and *τ*1,4 in the high-criticality mode of Figure 3.

In order to guarantee the schedulability in both criticality modes, we need appropriate frequency levels in each mode. The main problem of this paper is to determine optimal frequency levels that consider both modes.

#### **5. The Proposed Scheme**

#### *5.1. Dynamic Power Aware Scheme MCS Jobs*

The proposed scheme dynamically adjusts the CPU frequency level depending on both the system mode and task mode. The baseline frequency levels are derived from static analysis so that *x*, *f LO LO* , *<sup>f</sup> H I LO*,

and *f H I H I* are obtained before run-time. Throughout the optimization problem, we solve those values in the initial step.

The power-consumption with consideration of both high- and low-crticality modes in defined by the following three equations. The unit-time power consumption in low-crticality mode is derived by Equation (6), where *LCM* is the least common multiplier of all periods. In Equation (6), the total power consumption during *LCM* is computed by adding the power consumption of task *τ<sup>i</sup>* in low mode using Equation (1). The number of *τi*'s jobs is *LCM*/*Pi*. Thus, the unit-time power consumption is obtained by dividing the total sum with *LCM*.

Similarly, the unit-time power consumption in high-criticality mode is defined by Equation (7). Thus, the average unit-time power consumption can be obtained as the expected value in each mode, as in Equation (8), where *PLO* and *PH I* denote the probabilities of the system mode in low- and high-crticality, respectively.

$$\begin{split} \text{LUP}\_{\text{LO}} &= \quad \frac{1}{\text{LCM}} \left( \sum\_{\text{T}\_{i} \in T\_{\text{LO}}} \frac{\text{LCM}}{P\_{i}} \cdot \frac{\text{C}\_{i}(\text{LO})}{f\_{\text{LO}}^{\text{LO}}} \cdot (f\_{\text{LO}}^{\text{LO}})^{3} + \sum\_{\text{T}\_{i} \in T\_{\text{HL}}} \frac{\text{LCM}}{P\_{i}} \cdot \frac{\text{C}\_{i}(\text{LO})}{f\_{\text{LO}}^{\text{HL}}} \cdot (f\_{\text{LO}}^{\text{HL}})^{3} \right) \\ &= \quad \sum\_{\text{T}\_{i} \in T\_{\text{LO}}} \frac{\text{C}\_{i}(\text{LO})}{P\_{i}} \cdot (f\_{\text{LO}}^{\text{LO}})^{2} + \sum\_{\text{T}\_{i} \in T\_{\text{HL}}} \frac{\text{C}\_{i}(\text{LO})}{P\_{i}} \cdot (f\_{\text{LO}}^{\text{HL}})^{2} \\ &= \quad \mathbf{U}\_{\text{LO}}^{\text{LO}} \cdot (f\_{\text{LO}}^{\text{LO}})^{2} + \mathbf{U}\_{\text{LO}}^{\text{HL}} \cdot (f\_{\text{LO}}^{\text{HL}})^{2} \end{split}$$

$$\begin{split} \text{LIP}\_{HI} &= \quad \frac{1}{\text{LCM}} \sum\_{\tau\_i \in T\_{HI}} \frac{\text{LCM}}{P\_i} \cdot \frac{\text{C}\_i(HI)}{f\_{HI}^{HI}} \cdot (f\_{HI}^{HI})^3 \\ &= \quad \sum\_{\tau\_i \in T\_{HI}} \frac{\text{C}\_i(HI)}{P\_i} \cdot (f\_{HI}^{HI})^2 \\ &= \quad \text{L}^{HI}\_{HI} \cdot (f\_{HI}^{HI})^2 \end{split} \tag{7}$$

$$\begin{aligned} \text{UAP} &=& \text{UIP}\_{\text{LO}} \cdot \text{P}\_{\text{LO}} + \text{UIP}\_{\text{HI}} \cdot \text{P}\_{\text{HI}} \\ &=& \left( \text{U}\_{\text{LO}}^{\text{LO}} \cdot (f\_{\text{LO}}^{\text{LO}})^2 + \text{U}\_{\text{LO}}^{\text{HI}} \cdot (f\_{\text{LO}}^{\text{HI}})^2 \right) \cdot \text{P}\_{\text{LO}} + \text{U}\_{\text{HI}}^{\text{HI}} \cdot (f\_{\text{HI}}^{\text{HI}})^2 \cdot \text{P}\_{\text{HI}} \end{aligned} \tag{8}$$

For the given probabilities of *PLO* and *PH I*, the problem of deciding the optimal frequency levels and *x* of EDF-VD is: to minimize

$$\left(\mathsf{U}\_{LO}^{\mathrm{LO}} \cdot (f\_{LO}^{\mathrm{LO}})^2 + \mathsf{U}\_{LO}^{\mathrm{HI}} \cdot (f\_{LO}^{\mathrm{HI}})^2\right) \cdot \mathsf{P}\_{\mathrm{LO}} + \mathsf{U}\_{\mathrm{HI}}^{\mathrm{HI}} \cdot (f\_{\mathrm{HI}}^{\mathrm{HI}})^2 \cdot \mathsf{P}\_{\mathrm{HI}}\tag{9}$$

subject to

$$\frac{\mathbf{U}\_{LO}^{HI}}{f\_{LO}^{HI}} \cdot \frac{\mathbf{1}}{\mathbf{x}} + \frac{\mathbf{U}\_{LO}^{LO}}{f\_{LO}^{LO}} \le \mathbf{1} \tag{10}$$

$$\frac{\mathbf{U}\_{HI}^{HI}}{f\_{HI}^{HI}} + \mathbf{x} \cdot \frac{\mathbf{U}\_{LO}^{LO}}{f\_{LO}^{LO}} \le 1. \tag{11}$$

The scheduling system flow in low mode is shown in Figure 4a. Each task releases jobs with *Ci*(*LO*) execution time every period. Since we use EDF-VD, the virtual deadline of a high-criticality job released at

time *t* is given by *t* + *VDi*. The deadline of low-criticality job is set as *t* + *pi*. These new jobs are waiting in the ready queue.

(**a**) Scheduling flow in low mode.

(**b**) Scheduling flow in high mode.

**Figure 4.** The proposed scheduling framework.

The scheduling algorithm for jobs is based on early deadline first so that the job with the earliest deadline is scheduled first. At the time of dispatching a high-criticality job, the CPU frequency level is set as *f H I LO*. On the contrary, the frequency level is adjusted with *<sup>f</sup> LO LO* for low-criticality job execution.

When a high-criticality job does not complete its low-mode execution time, then the system switches to high-criticality mode. At that time, all low-criticality jobs are dropped in order to guarantee high-criticality tasks as shown in Figure 4b. However, the system can switch back to low-mode at any time when there is no pending task.

#### *5.2. DVFS Scheduling*

The notation for the scheduling algorithm is shown in Table 2. The task utilization of *τ<sup>i</sup>* is denoted as *Ui*. Each job, denoted as *Jk*, in the waiting queue is defined by (*Ck*, *Dk*) so that a job requires *Ck* execution time by the deadline *Dk*. The values are determined at the time of job release.


**Table 2.** Notations.

The proposed scheme is defined by functions that are called at a certain event. The algorithms are given in the followings pseudo-code in Algorithms 1 and 2.


When a job is released in low mode, the job is inserted in the ready queue. The task utilization is also updated. Since the frequency-level of a LO-criticality task is given by *f LO LO* , the task utilization is determined by the equation in line 5 of Algorithm 1. In case of a high-critical job of *Ci*(*H I*) − *Ci*(*LO*) every period so that the utilization is given by the equation in line 7. If the current system mode is low, we terminate or ignore the low-criticality job. If the current mode is high, we execute the high-criticality job (line 14). The job is inserted in the ready queue, we call the scheduling algorithm in line 19.

When the job *Ji* finishes its computation, if the current system mode is low, nothing is executed. We only check *Xi* = HI. We have two cases if *Ji* finishes. If *Ji* does not complete, the system mode becomes high. When the ready queue is empty and there is no high-criticality job in the ready queue, the system mode is changed from high to low (lines 29–31).

The function *Power-aware Schedule ()* dispatches jobs using EDF (line 38–43 of Algorithm 1). At each scheduling event, *Frequency-Adjust ()* function is called so as to adjust the CPU frequency dynamically. As shown in Algorithm 2, if the system is in high-criticality mode, we minimize the frequency of high-criticality mode which is set as *f H I H I* . The frequency level is set as the frequency level sufficient to schedule current jobs. Thus, the relative speed level of the frequency is greater than or equal to the current utilization.

**Algorithm 1** Algorithm of energy minimization consumption in mixed-criticality tasks.

1: **function** JOB-RELEASE(*τi*) 2: **if** the current system mode is Low **then** 3: Insert job *Ji*(*Ci*(*LO*), *t* + *VDi*) into Q*ready* 4: **if** *Xi* = Low **then** Low-criticality job 5: *Ui* <sup>←</sup> (*Ci*(*LO*)/ *<sup>f</sup> LO LO* )/*Pi* 6: **else** 7: *Ui* <sup>←</sup> (*Ci*(*LO*)/ *<sup>f</sup> H I LO* )/*Pi* <sup>+</sup> ((*Ci*(*H I*) <sup>−</sup> *Ci*(*LO*)/ *<sup>f</sup> H I H I* )/*Pi* 8: **end if** 9: **else** The current system mode is High 10: **if** *Xi* = Low **then** 11: *Ui* ← 0 12: **else** *Xi* = High 13: *Ui* <sup>←</sup> (*Ci*(*H I*)/ *<sup>f</sup> H I H I* )/*Pi* 14: Insert job *Ji*(*Ci*(*H I*), *t* + *Pi*) into Q*ready* 15: **end if** 16: **end if** 17: POWER-AWARE SCHEDULE( ) 18: **end function** 19: **function** JOB-FINISH(*Ji*) 20: **if** the current system mode is Low **then** 21: **if** *Xi* = High **then** High-criticality job 22: **if** *Ji* finish *Ci*(*LO*) completely **then** 23: *Ui* <sup>←</sup> (*Ci*(*LO*)/ *<sup>f</sup> H I LO* )/*Pi* 24: **else** 25: The system mode changed to High Mode switch to HI 26: **end if** 27: **end if** 28: **else** The current system mode is High 29: **if** <sup>Q</sup>*ready* = ∅ **then** 30: The system mode is changed from High to Low Mode switch back to LO 31: **end if** 32: **end if** 33: POWER-AWARE SCHEDULE( ) 34: **end function** 35: **function** POWER-AWARE SCHEDULE( ) 36: **if** <sup>Q</sup>*ready* <sup>=</sup> <sup>∅</sup> **then** 37: *Jk* ← the job with the earliest deadline in Q*ready* 38: **if** *Jcurr* = ∅ **then** CPU idle 39: *Jcurr* ← *Jk* 40: **else if** *Dk* < *Dcurr* **then** Preemption by EDF 41: *Jcurr* is preempted and re-Inserted into Q*ready* 42: *Jcurr* ← *Jk* 43: **end if** 44: FREQUENCY-ADJUST( ) 45: **end if** 46: **end function**

**Algorithm 2** Algorithm of selecting frequency.

```
1: function FREQUENCY-ADJUST( )
2: if The system is in High mode then
3: The frequency is set as f H I
                         H I .
4: else  The system is in Low mode.
5: U ← min(∑n
                i=1 Ui, 1.0)
6: if Xcurr = LO then
7: U ← U × f LO
                  LO
8: else
9: U ← U × f H I
                  LO
10: end if
11: freq ← the minimum fi ∈ F s.t. U ≤ fi/ fmax
12: The frequency is set as freq.
13: end if
14: end function
```
#### *5.3. Example*

Let us consider the task set in Table 1 as an example. The previous work derives the optimal value of *f LO LO* and *<sup>f</sup> H I LO* as 0.6 and 0.8, respectively. In high-criticality mode, the maximum frequency level is used. However, the proposed work derives the optimal frequency levels by solving Equation (9) with two constraints of Equations (10) and (11). Table 3 shows those values for given probabilities of high- and low-criticality mode.

For example, for a given *PH I* = 0.2, the optimal frequency levels of *f LO LO* , *<sup>f</sup> H I LO*, and *<sup>f</sup> H I H I* are 0.7, 0.8, and 0.9. The scheduling example of Table 1 in the same scenario as Figure 3 is shown in Figure 5. The frequency level in high-criticality is set as 0.9, not as 1.0. As shown in Table 3, the proposed work can reduce more energy in higher probability of high-criticality mode.

**Table 3.** Optimal frequency levels and *x* of the example of Table 1.

**Figure 5.** An example of proposed power-aware scheduling.

#### **6. Performance Evaluation**

#### *6.1. Simulations Environment*

We conduct extensive simulation to validate the proposed idea by utilizing random power-aware mixed-criticality task sets. Simulation parameters are shown in Table 4. We used six discrete frequency levels in the system. The execution time is randomly generated from 1 to 100. Then, the task period is defined in order to meet the target utilization. We have a different utilization of LO- and HI-criticality jobs which is 0.2, 0.25, 0.3, 0.35, 0.4, and 0.45. We have five different tasks in a set, where the numbers of LO-criticality and HI-criticality tasks are two and three in each. We generate 1000 random tasks sets to evaluate the effect of energy minimization consumption for a given tasks sets. We simulate each task set for the least common multiple of the tasks' periods.



#### *6.2. Energy Consumption Results*

We present energy consumption for different task sets as shown in Figure 6a–d. We measure the average value of 1000 task sets. The figure presents energy consumption as a function of system utilization for different probabilities. As shown in the figure, the proposed approach achieves better minimum energy consumption compared to that of existing approaches for the same task set. The main reason of minimum energy consumption is due to the task utilization at low and high criticality modes. The figure further shows that when the probability of high-criticality mode is increased, the impact of energy consumption gradually increases from 0.01 to 0.09. As shown in Figure 6c, the minimum energy consumption depends on the probability values for task utilization U = (0.2, 0.25, 0.3, 0.35, 0.4, 0.45).

We also present the impact of average *x* on energy minimization in Figure 7. We consider the same value of *x* for both previous and proposed approaches. When the value of utilization is increased by 0.35, the proposed approach achieves significant improvement in the performance. The impact of *x* in the probabilities is shown in Figure 7a. When the utilization is between 0.2 and 0.25, the average *x* is 0.4 but when the utilization is increased up to 0.35 and the value of *x* is increased by 0.56. When the utilization is between 0.35 and 0.4, then the average value of *x* goes to 0.65. This implies that in HI-criticality mode the energy consumption is not affected when we increase the value of *x*.

Figure 8 shows energy consumption as a function of different ratios of low- and high-computation times. The figure considers different values of *r* ranging from 1.5 to 3. The ratio between low-critical and high-critical execution time in the sequence in order to observe its effects on the scheduling of mixed-criticality tasks. As shown in Figure 8, the increasing ratio also leads to an increase in the average energy consumption. When the ratio is 1.5, the values of average energy for proposed and previous approaches are 0.082 and 0.136, respectively. Similarly, when the probability is between 0.6 to 0.4, the proposed approach minimizes energy consumption as compared to that of the previous approach as shown in Figure 8b. It is concluded that an increase in the ratio leads to increase in the average energy consumption of the mixed-criticality task sets.

**Figure 8.** The impact of ratio *r*.

The result in Figure 9 shows the impact of different task sets in mixed-criticality systems. The figure presents the average energy as a function of seven task sets, i.e., (1LO/6HI, 2LO/5HI, 3LO/4HI, 4LO/3HI, 5LO/2HI, 6LO/1HI) ranging from low to high critical modes. It is observed that the average energy is increasing for the average number of 1000 task sets.

**Figure 9.** Impact of the number of low- and high-criticality tasks (*PH I* = 0.2).

In Figure 10, the average energy consumption is presented for different frequency intervals. The figure shows the effects of the task-sets frequencies on minimum energy consumption. In the range between 0.4 and 0.5, we generate random task sets utilization for the sufficient number of tasks. When the frequency interval is between 0.05 and 1, the proposed approach outperforms the previous approach approach. Figure 10b shows that when the frequency interval is between 0.05 and 0.1, the value of *x* decreases. It is concluded that the proposed approach achieves a lower value of *x* compared to that of the previous approach.

**Figure 10.** Impact of frequency intervals.

#### *6.3. Comparison Summary*

The following Table 5 describes a comparison with the previous work. Although the previous work sets the maximum frequency level in high-criticality mode, the proposed scheme adjusts the level. When the probability of high-criticality mode is low, the performance of both work seems similar. However, the proposed work has more overhead for frequency scaling adjustment.



#### **7. Discussion and Conclusions**

#### *7.1. Discussion*

An issue of the proposed work is practicality in terms of the probability of high-criticality mode. Recent work [35,36] have considered the probability of execution times of tasks for mixed-criticality systems. In [37], they introduced the probabilistic confidence of a task and a system and provided statistical scheduling algorithm. In [35,36], probabilistic scheduling algorithms are analyzed for mixed-criticality real-time systems with a consideration of mode-switch probabilities.

As shown in Figure 6a, the proposed work shows the similar performance in low-*PH I* systems. When the probability of high-criticality mode is extremely low (e.g., 10<sup>−</sup>8), the effect of power reduction in high-criticality mode is negligible. However, the proposed work is still useful in terms of followings.

• Although the probability of mode-switch of an individual task is low, the probability of the system mode-switch can be increased for a larger number of tasks. Let us assume that *fi* is defined by the probability of the task's *τ<sup>i</sup>* mode-switch. Then, the probability of the system mode-switch of the task set *T* is derived by 1 − Π*τi*∈*T*(1 − *fi*) [35]. Figure 11 shows the probability of the system mode-switch in terms of individual task's probability and the number of tasks (*N*). Let us note that the x-axis in Figure 11 is log-scale. In case of *N* = 50, the proposed work may affect the performance from the probability of task mode-switch of 0.002 because the proposed work shows performance gain where *PH I* ≥ 0.1. On the contrary, when the number of tasks is higher (e.g., *N* = 200), the probability of system mode-switch will become higher from lower task mode-switch probability (e.g., *fi* = 0.001). Thus, the proposed work will be useful depending on the number of tasks and task's mode-switch probability;

**Figure 11.** The probability of mode switch w.r.t. task mode switch probability and the number of tasks.


#### *7.2. Concluding Remark*

In this paper, we designed a new dynamic power-aware scheduling scheme of mixed-criticality real-time tasks under high frequency scaling on unicore processors. To tackle the difficulty in trading off minimizing power in HI-criticality mode to reduce the overall average energy, we first proposed reducing the energy level in high-criticality mode. Furthermore, we switched to low-critical mode if there was idle time between high critical job executions.

Our experimental simulation results show that our scheme is more efficient in terms of reducing energy at the high critical mode as well as in low critical mode. Our proposed scheme outperformed the static scheme for reducing energy because the frequency scaling in the static scheme may not have been optimal in dynamic scheme. The results validated that our proposed scheme better performed by increasing the probability of the high critical tasks in comparison to low critical tasks.

We plan to investigate more on the proposed scheduling scheme and extend it to the multi-core processor systems. In addition, we will further analyze the probability of high-criticality mode in many applications and apply it to the proposed work. We will also apply the probabilistic scheduling approach in the proposed work in order to find the optimal power-aware scheduling.

**Author Contributions:** I.A. and K.H.K. proposed the main idea, implemented the simulations, and wrote the draft-version manuscript. Y.-I.J. verified the simulation program and analyzed the the simulation results. S.L. and W.Y.L. reviewed the manuscript. All the authors have reviewed and revised the final manuscript. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported partly by the Human Resources Development of the Korea Institute of Energy Technology Evaluation and Planning (KETEP) grant funded by the Ministry of Trade, Industry and Energy (No. 20194030202430), and by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (grant NRF-2018R1D1A1B07050093).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**




**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Dynamic All-Red Signal Control Based on Deep Neural Network Considering Red Light Runner Characteristics**

#### **Seong Kyung Kwon †, Hojin Jung † and Kyoung-Dae Kim \***

Daegu Gyeongbuk Institute of Science & Technology, Daegu 41000, Korea;

sk\_kwon@dgist.ac.kr (S.K.K.); hojinwkd@dgist.ac.kr (H.J.)

**\*** Correspondence: kkim@dgist.ac.kr

† These authors contributed equally to this work.

Received: 31 July 2020; Accepted: 24 August 2020; Published: 1 September 2020

**Abstract:** Despite recent advances in technologies for intelligent transportation systems, the safety of intersection traffic is still threatened by traffic signal violation, called the Red Light Runner (RLR). The conventional approach to ensure the intersection safety under the threat of an RLR is to extend the length of the all-red signal when an RLR is detected. Therefore, the selection of all-red signal length is an important factor for intersection safety as well as traffic efficiency. In this paper, for better safety and efficiency of intersection traffic, we propose a framework for dynamic all-red signal control that adjusts the length of all-red signal time according to the driving characteristics of the detected RLR. In this work, we define RLRs into four different classes based on the clustering results using the Dynamic Time Wrapping (DTW) and the Hierarchical Clustering Analysis (HCA). The proposed system uses a Multi-Channel Deep Convolutional Neural Network (MC-DCNN) for online detection of RLR and also classification of RLR class. For dynamic all-red signal control, the proposed system uses a multi-level regression model to estimate the necessary all-red signal extension time more accurately and hence improves the overall intersection traffic safety as well as efficiency.

**Keywords:** Intelligent Transportation System (ITS); deep neural network; Red Light Runner (RLR); dynamic signal control; intersection safety

#### **1. Introduction**

As the traffic volume in urban areas has increased significantly over the last decades, there has been many demands and efforts to develop and deploy technologies for intelligent transportation systems in order to address issues of traffic congestion, safety, efficiency, and also environmental improvements [1]. Undoubtedly, one of the most complex, dangerous, and important traffic environments on the road is the intersection, where traffic flows from different directions overlap in a common space, and it also has substantial impacts on the overall urban traffic efficiency and safety [2]. At intersections, traffic flows from different directions are typically coordinated through traffic light systems to prevent conflicting traffic flows passing the intersection simultaneously. Therefore, if a traffic participant violates the traffic rules imposed by the traffic light, the other participants in the intersection inevitably face the risk of an accident. The most representative example of such a traffic participant that violates the traffic signal is the Red Light Runner (RLR) [3].

An RLR is a vehicle passing through an intersection, ignoring the traffic signal when the traffic light is red. According to the AAA Foundation for Traffic Safety, the number of deaths from RLRs increased by 31% from 2009 to 2017. In addition, the Insurance Institute for Highway Safety (IIHS) reported in 2017 that approximately 132,000 casualties were caused by the RLR. Also, the Manual on

Uniform Traffic Control Devices (MUTCD), a standard for maintaining and installing traffic control devices, provides the control of intersection signals to reduce RLR accidents [4]. In general, intersection traffic lights consist of green, yellow, red and all-red signals. All-red signals exist when the intersection traffic light changes from yellow to red and red to green, and is used to prevent accidents caused by vehicles entering the intersection with the yellow signal [5]. MUTCD proposes the construction of a system that extends the all-red signal of intersection traffic lights when an RLR is detected. The length of the all-red signal needs to be determined so that collision by RLR does not occur in the intersection. One of the methods of determining the signal extension time is to use a statistical method to extend a constant time regardless of the current state of the vehicle. Another method extends the all-red signal by dividing the distance to the collision prediction point by the speed of the current vehicle.

The current all-red signal extension system depends only on the vehicle speed at the moment when an RLR is detected. However, if the RLR does not move at a fixed speed as expected, the safety in the intersection cannot be ensured. Therefore, in this paper, we propose a framework for a dynamic all-red signal control system that determines the signal extension time according to the driving pattern of the detected RLR. In this proposed system, driving patterns of RLR vehicles are distinguished through the Multi-Channel Deep Convolutional Neural Network (MC-DCNN) [6]. Also, a multi-level regression strategy, consisting of the Hougen–Watson nonlinear regression model [7] and a quadratic polynomial regression model, is used to estimate the necessary all-red signal extension time with improved accuracy.

The structure of the paper is organized as follows. Section 2 introduces conventional RLR prediction and signal extension methodologies. An overview of the proposed system is presented in Section 3. Clustering and classification based on the characteristics of RLRs are covered in Section 4. The proposed dynamic signal control model is described in Section 5. We validate the performance of the proposed system in Sections 6 and 7. Finally, conclusions are discussed in Section 8.

#### **2. Related Works**

RLR is an action that threatens the traffic system passing through by ignoring the signaling system at a signaled intersection. RLR is a serious problem that can lead to fatal traffic accidents as well as minor traffic violations. A collision between a violating vehicle and another vehicle legally passing through an intersection and a green traffic light is called an RLR collision. To avoid RLR-related collisions, it is important to identify factors that have a significant impact on the behavior of RLR drivers and to predict RLR likelihood in real-time [8].

Li et al. [9] proposed a connected vehicle based dynamic all-red extension (DARE) framework to prevent potential collisions due to RLR. The proposed method performs binary classification of RLR and Non-RLR based on non-weighted and weighted least square support vector machines (LS-SVM) using continuous trajectories measured by radar sensors. As a result, RLR and Non-RLR were classified with higher accuracy compared to other techniques based on conventional inductive loop detection. In [10], RLR prediction consists of two parts: arrival time and vehicle behaviors when the vehicle reaches the stop line. The proposed technique is a Bayesian network (BN) probability model based on continuous trajectories collected by radar sensors for RLR prediction. Based on the vehicle's speed, acceleration, and car-following behavior, and the causality of BN, RLR prediction performance was improved. In addition, the driving decision maker was provided with the predicted RLR probability and contributed to the improvement of traffic safety. de Goma et al. [11] proposed a camera-based RLR detection technique using a Single Shot Detector (SSD). In this study, researchers use cameras to collect data at intersections. The proposed system achieved RLR detection performance of 92.1% by applying a deep learning based approach. However, despite the high detection performance, the proposed technique focused on detection rather than the prediction of RLR as a camera-based technique. In [12], a random forest-based learning model was proposed to predict RLR violation. In addition, observation data and driver simulator data were used to analyze factors affecting RLR. According to the results of the proposed prediction model, the important factors for predicting RLR

violations are the distance between the vehicle and the intersection, time to intersection (TTI) and the speed of the vehicle at the yellow onset.

In order to reduce accidents at intersections, techniques to control the traffic signal when RLR is detected has been proposed. In [13], a traffic signal countdown (CT) auxiliary device is used in order to reduce the RLR. The CT-based traffic light system aims to reduce RLR by providing the driver with the remaining time of green light. However, at the end of the green light's duration, the RLR may increase as the driver accelerates through the intersection before the signal changes. Likewise, if there is little red light remaining when reaching the intersection, the driver will not decelerate and may enter the intersection early and an RLR may occur. Control of the yellow signal interval had a positive effect on the reduction of RLR [14]. Control of the yellow signal interval helped the driver to make a driving decision at the intersection. According to the study, the time duration of the yellow signal that most effectively reduces RLR is 5 s, and when the duration of the yellow signal exceeds 5 s, RLR is increased again. Since this method is a fixed yellow signal setting, the effect is reduced if the driver gets used to the yellow signal in the long term. Retting et al. [15] proposed an extension of the yellow signal and an enforcement system using a Red Light Camera (RLC). The incidence of RLR was reduced by 36% by increasing the duration of the yellow stop light by 1 s. In addition, by applying an enforcement system using RLC, the RLR incidence rate was reduced by more than 96%. Collotta et al. [16] proposed a method to reduce RLR violations by dynamically allocating signal periods through a Wireless Sensor Network (WSN). The main goal is to dynamically change the green time based on the queue length, allocating a larger green time to the road with the longest queue. Experiments conducted in Philadelphia reduced RLR violations through dynamic assignment of traffic signal periods. However, changing signal settings under the influence of RLR can lead to an asymmetric traffic assignment problem. In [17], authors argued that two distinct problems can be formulated to address the asymmetric traffic assignment problem: First, the global optimization of signal setting and traffic assignment (GOSSTA) combined problem and second, the local optimization of signal setting and traffic assignment (LOSSTA) combined problem. Related to these problems, Adacher et al. [18] transformed the GOSSTA problem into a surrogate continuous optimization problem via a generalized surrogate problem methodology based on an online control scheme and solved the latter using a standard gradient-based approach. On the other hand, D'Acierno et al. [19] proposed an Ant Colony Optimization (ACO) algorithm to solve LOSSTA. The results of the proposed ACO algorithm for real networks were able to get the solution in a shorter time with the same accuracy as the conventional method of the successive averages (MSA) approach [20].

Kashani et al. [21] identified driver and vehicle characteristics that affect accidents using classification and regression tree techniques based on the 2012–2016 Isfahan crash database. In this study, the tree model divided drivers into three age groups: under 22.5 years old, 22.5 to 51.5 years old, and over 51.5 years old. It also suggested improving driver education, increasing traffic fines, and banning drivers with poor driving history to reduce RLR. Fu et al. [22] proposed a step-by-step penalty strategy to prevent the re-offending of RLR vehicles. Despite the rigorous penalty strategy to reduce RLR, its effectiveness was limited. The reason is that traffic delays for other vehicles due to the potential risk of collision with RLR vehicles are not included. In addition, both unintentional and intentional RLRs are subject to the same penalties because the proposed system cannot make a clear distinction between unintentional RLR and intentional RLR. This may be unfair for unintended RLR violators.

Conventional studies have focused on the binary classification of RLR and Non-RLR, and the extension of fixed time signals. Penalties were also effective in reducing RLR. Additionally, some studies have discussed penalty policies to reduce RLR. However, excessive penalties for unintended RLR are a problem to be solved. Our proposed system performs a specific classification of RLR based on features rather than binary classification of RLR and Non-RLR. This can contribute to the classification of unintended RLRs based on the characteristics of RLRs, and is expected to positively help in constructing a stronger RLR fines system. In addition, our proposed system can contribute to the improvement of safety and efficiency of the intersection traffic system based on the dynamic all-red signal extension conforming to the specified RLR class. As discussed above, while it is still possible that the proposed dynamic all-red signal extension may cause an asymmetric traffic assignment problem, it is not the primary focus of this paper to improve the overall traffic efficiency by solving the asymmetric traffic assignment problem as done in many aforementioned related works. Instead, we focus more on improving the safety of intersection traffic by preventing accidents due to RLRs and also the efficiency of it by overcoming the problem of conventional fixed signal extension mechanisms.

#### **3. System Overview**

For better intersection safety, a dynamic all-red signal control is necessary to avoid collisions due to sudden appearances of RLRs. To address the issues with conventional fixed signal extension approach, the proposed system identifies first which incoming vehicles are likely to be RLRs and then utilizes the driving characteristics of the detected RLR to adjust the length of the all-red signal accordingly. Hence, the proposed system improves the overall safety as well as efficiency of intersection traffic.

Figure 1 shows the overall architecture of the proposed dynamic all-red signal control system. The first step of the process begins with traffic data collection from the intersection traffic environment. Traffic data to be collected includes traffic signal as well as all incoming vehicles' movement data such as each vehicle's speed, acceleration, distance to the intersection (DTI) and headway during a certain time duration. Note that, for the purpose of all-red signal length control, the system requires traffic data measured while the traffic signal is in the yellow state. The next step of the process is to identify which incoming vehicles are likely to be RLR. As shown in Figure 1, we use the MC-DCNN classifier for this purpose. The proposed MC-DCNN classifier classifies not only whether an incoming vehicle is likely to be an RLR or not but it also classifies into several different types of RLR based on the vehicle's driving characteristics if the vehicle is likely to be an RLR. Then the last step of the process is to determine the length of all-red signal extension based on the detailed classification result from the MC-DCNN. For this step, we use a multi-level regression approach consisting of the Hougen–Watson nonlinear regression and a quadratic polynomial fitting to determine the necessary all-red signal extension time. More details on each of the steps in the process are covered in the following sections.

**Figure 1.** The proposed dynamic all-red signal control architecture.

#### **4. Clustering and Classification**

In general, the intersection is considered as the most complex road traffic environment. Furthermore, each vehicle on the road shows very different driving characteristics depending on the driving style or physical/mental conditions of the driver in the vehicle. Hence, the movements of vehicles approaching an intersection to cross are very different from each other and are affected by various factors. Thus, for safer intersection traffic through traffic light control, it is not enough to identify which vehicle is likely to be an RLR. To determine the length of the all-red signal appropriately, it is also necessary to identify the characteristics of the vehicle movement and determine the necessary

all-red signal time for the vehicle accordingly. Our approach to addressing this issue is to utilize techniques for time-series clustering for characterization of RLRs into several clusters according to their movements. Then, the identified groups of clusters are used as labels for the generation of the traffic dataset to be used for training the MC-DCNN classifier.

Figure 2 shows the overall procedure for dataset generation. The data collected from the traffic environment includes traffic signal data, vehicle movement data, and also whether each vehicle is RLR or non-RLR. In the collected raw traffic data, RLR vehicles are not distinguished according to their characteristics. Therefore, clusters for each RLR characteristic are generated through Dynamic Time Wrapping (DTW) and Hierarchical Clustering Analysis (HCA) processes. After this process, a dataset for training MC-DCNN is constructed based on the traffic data together with RLR cluster labels so that each vehicle in the dataset is now labeled with a cluster ID according to its driving characteristics.

**Figure 2.** Dataset preparation process for classification.

#### *4.1. Time-Series Clustering*

Conventional studies are based on the assumption that the RLR passes through the intersection at a fixed speed. However, RLR vehicles have a variety of driving characteristics in the real world. Therefore, we adopt the clustering method to define the driving characteristics of RLR. The clustering method performs merging into one group when the similarity between data is high, and splits into another group when the similarity is low. However, driving characteristics are difficult to define with one moment of data. Therefore, driving data continuously measured over a certain period of time and a clustering method for time-series data are required. In general, the time-series clustering method consists of a representation of continuous-time trajectories in time-series form, calculation of similarity or distance measure between every pair of time-series data, and then clustering all time-series data into several groups according to the similarity measure.

At an intersection, a vehicle's speed profile changes dramatically in response to traffic signals. Vehicles with no intention of signal violations and RLR vehicles typically show different movement from the start of the yellow signal [23]. Furthermore, it is well known that the speed profile of a vehicle represents the driving pattern of the vehicle and also reflects various factors affecting the vehicle motion such as driving condition and driving style [24,25]. As an illustration of how other factors affect the speed profile of a vehicle, Figure 3 shows a comparison between driving profiles of two different RLR clusters. Figure 3a shows a pattern in which the speed and acceleration are maintained without significant change after 1 s of yellow onset. The DTI shows a decreasing pattern because it is moving toward the intersection. The headway has a value of 1, which means that there is no preceding vehicle. On the other hand, the headway shown in Figure 3b changes from 1 to 0 around 1.5 s after the yellow onset. This means that a preceding vehicle suddenly appeared in front of the vehicle from the other lane. With its influence, the speed and acceleration of the RLR decreases rapidly and then increases as the headway increases again.

**Figure 3.** Comparison of driving profiles from two different Red Light Runner (RLR) clusters. (**a**) Speed maintenance pattern; (**b**) Acceleration after deceleration by the preceding vehicle pattern.

Similarity measure is a way to check the similarity between time-series data. We calculate the similarity measure based on the speed profile of each vehicle and utilize it to create a cluster as the speed profile of a vehicle is one of the representative time-series data used to distinguish RLR vehicles. The most commonly used methods for calculating the similarity measure are the Euclidian distance and DTW. Euclidian distance is a technique to calculate the distance between two time-series in each time slice by one-to-one matching. This technique is simple and fast, but there is a limitation when there exists a time shift between sequences. In comparison, DTW performs one-to-many or many-to-one matching and is more robust than the Euclidian distance technique for time shifts between sequences [26]. Therefore, we use DTW to calculate the similarity measure in the speed profiles of various vehicles.

Once the similarity measures are calculated through DTW, clustering is performed through Hierarchical Clustering Analysis (HCA) [27]. HCA is an algorithm that performs clustering using a hierarchical tree structure. Since the number of clusters of driving characteristics of RLR cannot be pre-defined easily, we determine the number of clusters by investigating the tree structure where the difference in similarity measure calculated by DTW increases rapidly. Through the HCA process, the driving characteristics of RLR vehicles are divided into four groups which are (i) acceleration (Type A RLR), (ii) acceleration after deceleration (Type B RLR), (iii) speed maintenance (Type C RLR), and (iv) acceleration after deceleration by preceding vehicle (Type D RLR). Here, the created clusters are used for the training process of the classification model.

#### *4.2. Classification*

A traditional technique for RLR detection and classification is the Support Vector Machine (SVM) [28,29]. SVM is a technique that classifies into two classes by obtaining a decision boundary that separates several sample points. The decision boundary separates two classes of clusters, and the sample closest to the boundary becomes the support vector. SVM classifies the binary classes by finding the decision boundary that maximizes the margin between the support vector and the decision boundary. Multi-class SVM for multi-class classification obtains sub-SVMs for classifying each class, and performs multi-class classification based on this idea [30,31]. Recently, deep learning models with higher classification accuracy than SVM have been proposed [32]. A representative deep learning model is the Convolutional Neural Network (CNN). In general, CNN consists of a convolutional layer, Rectified Linear Unit (ReLU) layer, pooling layer, and fully-connected layer [33]. The convolutional layer extracts the features of the input, while the ReLU layer increases the non-linearity properties of the convolutional layer. The pooling layer prevents overfitting through down-sampling. Finally, scores are calculated for each class of output in the fully connected layer. However, the general CNN

is an image-based model but not for the time-series data. Therefore, a deep learning model using time-series data as input is needed.

We use the Multi-Channel Deep Convolutional Neural Network (MC-DCNN), a signal data-based model for time-series classification [34,35]. MC-DCNN is a model that uses time-series data of each sensor as the input of multi-channels. The proposed MC-DCNN model uses the speed, acceleration, headway, and DTI as input signals. Since the driving pattern obtained from the speed profile of the vehicle is affected by various driving conditions, the headway and DTI are also selected as inputs to consider the front vehicle and the distance to the intersection. In addition, speed and acceleration are selected to analyze the driving pattern of the RLR. Since the input signal used for MC-DCNN is time-series data, a window length and a prediction time after yellow onset are also required to determine the time interval of traffic data measurement and also to determine when to perform the classification. Prediction time after yellow initiation refers to the point at which RLR is predicted after the start of the yellow signal. If the window length is 2 s and the prediction time after the onset of yellow is 3 s, 2 s of data are collected from 1 to 3 s after the yellow onset.

The proposed network structure consists of two convolutional, ReLU, pooling layers and the last fully connected layer. The convolution layer is composed of a 1D convolution because the driving pattern is identified through the feature over time [36]. The last layer is a softmax, which outputs a distribution over classes. The classes are defined in five categories: Non-RLR, Type A RLR, Type B RLR, Type C RLR, and Type D RLR.

#### **5. Dynamic All-Red Signal Control**

In order to dynamically control the length of the all-red signal considering the driving characteristics of RLRs, it is necessary to predict the time at which the RLR under consideration can completely pass the intersection. For this purpose, we use multi-level regression to predict the necessary time duration for the RLR to completely get out of the intersection from the moment of prediction, which we call the *intersection passing time* in the sequel. The input data for the regression model is composed of the speed, DTI, and headway of the RLR at the prediction time. As the first level of regression for prediction, we use the Hougen–Watson model in (1), one of the nonlinear regression models, to roughly estimate the intersection passing time.

$$\hat{y} = \frac{\beta\_1 \mathbf{x}\_2 - \mathbf{x}\_3 / \beta\_5}{1 + \beta\_2 \mathbf{x}\_1 + \beta\_3 \mathbf{x}\_2 + \beta\_4 \mathbf{x}\_3} \tag{1}$$

where *y*ˆ is the predicted intersection passing time and variables *x*1, *x*2, *x*<sup>3</sup> are the DTI, the speed, the headway of a vehicle, respectively. *β*1, ··· , *β*<sup>5</sup> in (1) are parameters to be determined through regression using data. In our study, we determined these parameters by the Levenberg–Marquardt nonlinear least squares algorithm [37,38]. The Levenberg–Marquardt algorithm is a combination of two minimization methods, which are known as gradient descent and Gauss–Newton. The Levenberg–Marquardt operates in a gradient descent method when it is far from the solution, and finds the solution in a Gauss–Newton method near the solution. In addition, the Levenberg–Marquardt method is more stable than the Gauss–Newton method and converges to the solution relatively quickly, so the Levenberg–Marquardt method is mostly used in the nonlinear least square problem. The Levenberg–Marquardt nonlinear least squares algorithm optimizes the model by iteratively reducing the sum of squares of errors between the model and the measured data through an update process to the parameters.

As the prediction of intersection passing time of RLR through the Hougen–Watson model is a rough estimate of actual intersection passing time required for the RLR, the predicted time can be much shorter than necessary for some cases. This means that if the length of all-red signal is adjusted according to this estimated intersection passing time, then vehicles from other direction may enter the intersection before the RLR completely clears the intersection. Thus, it is necessary to address such safety issue caused by using only the Hough–Watson model in predicting intersection passing time. For this purpose, we also use the quadratic polynomial fitting model as the second level of regression based on the prediction results of the Hougen–Watson model for better safety of intersection traffic. Furthermore, since prediction of the intersection passing time of RLR without considering the driving characteristics of the RLR, the predicted intersection passing time can be too conservative in some cases. Therefore, to address this issue and improve the overall traffic efficiency, we build separate multi-level regression models according to RLR classes, as described in Section 4.2, and predict the intersection passing time of an RLR according to its RLR class. More details on this multi-level regression framework and results are given in Section 7.3.

#### **6. Traffic Simulation**

The system proposed in this paper requires data collection for clustering and classification. However, it is difficult to collect traffic data in a real environment. Therefore, we use the Vissim traffic simulator, which is widely used in transportation engineering for microscopic traffic simulation, to collect intersection traffic data and also to evaluate the performance of the proposed system.

Figure 4 shows the intersection traffic environment configured in Vissim and also shows the traffic signal phases. A standard intersection model is used, which has three input lanes and two output lanes for each ramp way. The leftmost input lane is for left turning, and the center lane is for straight traffic. The far right lane is used for both straight and also for right turning with 20% probability. The traffic signal cycle at the intersection consists of four phases. The signal duration is set to be 27 s for straight traffic and 15 s for left turning traffic according to the traditional Webster's method [39]. On the other hand, the signal duration for yellow and red in each phase are set differently according to the traffic speed based on the FHWA's Traffic Signal Timing Manual [40]. Traffic flow includes car-following and lane change motion.

**Figure 4.** Intersection traffic simulation in Vissim. (**a**) Simulated intersection traffic; (**b**) Traffic signal phase.

Since Vissim provides two different models, called the *continuous decision model* and *one decision model*, to mimic the reaction patterns of real drivers at an intersection when the traffic signal changes from green to yellow, we utilize both of these models in our simulations to generate a more realistic intersection traffic data.

In a continuous decision model, there are two options available. First, a vehicle will not brake, if even the maximum deceleration would not allow for a stop at the stop line. Second, a vehicle brakes if a vehicle cannot pass the traffic light within 2 s when continuing at its current speed rate. On the other hand, in one decision model, the decision made at the time of the yellow onset is kept until the vehicle has passed the stop line. A vehicle stops according to the following probability

$$p = \frac{1}{1 + e^{-a\_1 - a\_2 v - a y dx}}\tag{2}$$

where *v* is the vehicle's current speed, *dx* is the DTI, and *α*1, *α*2, *α*<sup>3</sup> are fitting parameters. In our simulation, we use the default values for these fitting parameters, which are *α*<sup>1</sup> = 1.59, *α*<sup>2</sup> = 0.27, and *α*<sup>3</sup> = −0.26 provided in Vissim.

Figure 5 shows representative reaction patterns of traffic observed in simulation according to two decision models. Depending on the state of a vehicle such as current speed, DTI at the time of yellow onset, the vehicle reacts into three different patterns. First, *Go* is the case when a vehicle enters the intersection before the red signal, *Stop* is the case when a vehicle stops at the stop line on the red signal, and finally, *RLR* means the case when a vehicle is entering the intersection at the red signal [9]. Figure 5a shows the change in speed for each reaction of continuous decision traffic. The vehicle with Go reaction does not have a red signal before the distance to the intersection becomes 0 m. The vehicle with Stop reaction stops gradually with a yellow signal starting at a distance more than 60 m from the intersection. However, vehicles with the RLR reaction show that they start to accelerate rapidly between about 15 to 20 m before the intersection. Figure 5b shows the speed change for each reaction of one decision model. In one decision model, Go and Stop reactions are similar to those in the continuous decision model. On the other hand, one of the vehicles with RLR reaction maintains speed without significant change in its speed even when the yellow signal starts. Thus, for the purpose of our study in this paper, we can confirm that the intersection traffic simulated in Vissim according to two decision models can provide a close enough representation to actual intersection traffic.

**Figure 5.** Comparison of vehicle reaction pattern between continuous decision model and one decision model where colored circles in the images indicate the traffic light signals. (**a**) Examples of velocity profiles with continuous decision model; (**b**) Examples of velocity profiles with one decision model.

Table 1 shows the statistical result of 2567 vehicle data (RLR: 1710 and non-RLR:857) collected over 24 h simulation in Vissim. This result is obtained with vehicles of which DTI is less than 100 m at the time of yellow onset. In the case of the continuous decision model, acceleration of RLR is relatively high compared to that of Non-RLR. This means that RLR vehicles attempt to pass the intersection faster than non-RLR vehicles. In the case of RLR in one decision model, acceleration is the lowest but it has a high headway on average. In addition, the mean and standard deviation of acceleration are the smallest and thus the movement of maintaining the speed is observed. In the results, we can also observe that RLR vehicles of the two decision models have higher mean speed than other reaction patterns. In addition, the mean of DTI in both decision models is farther than that with the Go reaction but closer than that of the Stop reaction.


**Table 1.** Statistical results of traffic simulation data collected at the time of yellow onset.

#### **7. Results**

In this section, we present results of the proposed clustering, classification, and dynamic all-red signal control approach obtained through traffic simulations in Vissim.

#### *7.1. Clustering*

As described in Section 4.1, we use the DTW algorithm to measure the similarity between a pair of speed profile time-series. Figure 6 shows several examples of speed profile time-series data, selected from different clusters which are determined later through the HCA clustering process, to illustrate the effect of the DTW algorithm for optimal alignment of two time-series data and the similarity measure calculated between them. Figure 6a shows a comparison between speed profiles from Type A RLR and Type B RLR clusters. The similarity measure between these two time-series data, calculated as the accumulated pairwise Euclidean distance, is 131.26 in this case. Similarly, Figure 6b,c also show the similarity results from different RLR clusters where similarity measures calculated from the DTW algorithm are 87.85 and 71.23, respectively. On the other hand, Figure 6d shows the similarity result between a pair of speed profile time-series selected from the same cluster, which is Type B RLR in this case. For these speed profile time-series data, the similarity measure from the DTW algorithm is less than 25, which is substantially lower than the other three cases in the figure and hence clearly indicates that these two time-series are quite similar to each other in terms of their shapes while they may be in slightly different phases.

Next, to determine the number of clusters via HCA based on the similarity measures, it is necessary to choose a threshold appropriately for the value of a similarity measure. If the threshold for cluster separation is too low, then there will be too many clusters formed and the driving characteristics of RLRs between clusters are not clearly distinguishable. Therefore, we investigate the hierarchical structure of clusters generated from HCA for all RLR traffic datasets and choose to separate clusters when the similarity measure suddenly increases more than 50 in the HCA process since clusters formed

from this are most reasonably distinguishable in terms of their driving characteristics. As a result, there are four different clusters formed for RLRs, as described in Section 4.1.

(**a**) Type A RLR (data 1) vs. Type B RLR (data 2). (**b**) Type B RLR (data 1) vs. Type D RLR (data 2).

**Figure 6.** Examples of similarity results through the Dynamic Time Wrapping (DTW) algorithm.

Figure 7 shows the result of clustering generated through the HCA process for all RLR traffic data collected from the Vissim simulation. Figure 7a–d shows RLR speed profiles of each cluster. As shown in the figure, four RLR clusters show different driving characteristics where RLR in Type A keeps accelerating to cross an intersection, RLR in Type B first decelerates and then accelerates, RLR in Type C is mostly maintaining its speed, and finally, RLR in Type D exhibits similar behavior as Type B in the beginning but decelerates rapidly shortly after accelerating due to the sudden appearance of a proceeding vehicle in front of the RLR.

**Figure 7.** Clustering of RLR traffic dataset through Hierarchical Clustering Analysis (HCA).

#### *7.2. Classification*

For online classification of an incoming vehicle to predict whether the vehicle is a Non-RLR or one of the four RLR types, we use MC-DCNN as described in Section 4.2. For training of the MC-DCNN model, we built a training dataset from traffic data consisting of time-series of vehicle speed, acceleration, DTI, and headways with cluster type determined through HCA so that each vehicle in the training dataset is labeled whether it is a Non-RLR, Type A RLR, Type B RLR, Type C RLR, and Type D RLR. Therefore, as shown in Figure 1, the trained MC-DCNN model gives a prediction to which class out of the above five classes an incoming vehicle is classified.

To evaluate the classification performance MC-DCNN, we compare the classification accuracy of MC-DCNN with that of SVM using the validation dataset. Tables 2 and 3 are classification accuracy results using SVM and MC-DCNN, respectively. In the results, the classification accuracy is 100% if the classifier classifies all five classes, Non-RLR, Type A, B, C, D RLR correctly. The "window size" means the time-series length of input data, and the "prediction time after yellow onset" means the time when a classifier performs classification after yellow onset. In the case of SVM, if the window size is 0 s (i.e., there is only one data point in the input time-series), the accuracy is lower than about 60% regardless of prediction time. Table 2 also shows that the longer the input time-series length, the better the classification performance. The highest accuracy appears when the window size is 3 s and the prediction time is 2.5 or 3 s after yellow onset.

Compared to the result from SVM, the classification accuracy of the MC-DCNN model is substantially better than that of SVM especially when the windows size is small. For instance, even the classification accuracies of MC-DCNN with 0 s window size in all prediction time cases are comparable to those of SVM with a 2 s window size. Also, the highest accuracy achieved by MC-DCNN with 1 s window size is 99.9% at 3 s prediction time while SVM with the same window size and prediction time can achieve only up to 87.5%. It is interesting to see that this 99.9% accuracy with 1 s window size is even better than the highest classification accuracy of SVM achieved with the longest window size. As a result of this comparison, it is shown that the MC-DCNN classification model proposed in this work can classify the class of an incoming vehicle more accurately than SVM even with shorter duration of vehicle motion measurement and also at a slightly earlier time after yellow onset. Furthermore, it is expected that the proposed MC-DCNN model can be applied to improve the performance of the system for imposing fines for vehicles violating traffic signals based on the accurate classification performance.


**Table 2.** Classification accuracy of Support Vector Machine (SVM) classifier.

**Table 3.** Classification accuracy of the Multi-Channel Deep Convolutional Neural Networks (MC-DCNN) classifier.


#### *7.3. Dynamic All-Red Signal Control*

For the safety of intersection traffic under the threat of RLRs, an approach of all-red signal extension has been proposed to extend the all-red signal to a *pre-fixed* time duration, which is typically less than 5 s, in order to prevent vehicles from other directions entering the intersection when an RLR is detected. However, the fixed-time all-red signal extension may not be effective as drivers can adapt easily to the fixed extension time. In addition, it may reduce the intersection traffic efficiency in case the all-red signal extension time is chosen too conservatively and it may also reduce the traffic safety in case the all-red signal extension time is too short.

To address such issues related to the fixed-time all-red signal extension approach, we incorporate the driving characteristics of RLR to determine the necessary all-red extension time. For this purpose, we adopt a nonlinear regression model, called the Hougen–Watson model, to develop an all-red extension time prediction model based on the traffic data collected from the Vissim simulation. The Hougen–Watson model performs nonlinear fitting through multivariate input of speed, DTI and headway, and has the advantage of being easily usable because it is provided as a Matlab function.

Figure 8 shows the comparison between the actual intersection passing time calculated from the traffic data and the predicted intersection passing time by the Hougen–Watson prediction model for all RLRs traffic data. In the figure, circular points represent RLRs. For each RLR, the actual and the predicted intersection passing times for the vehicle can be compared between the values in the vertical and horizontal axis. The diagonal line, called the *Base line* in the figure, represents when the actual and predicted time matches. Thus RLRs above the base line actually take longer time than the predicted intersection passing time to completely cross an intersection. As shown in the figure, a large number of RLRs are shown above the base line. Therefore, for such RLRs, the Hougen–Watson prediction model alone is not enough to predict the necessary all-red signal extension time for all types of RLRs. To address this issue, we identified RLRs, called the *Outliers*, from the dataset in which actual intersection passing times are larger and also maximally deviated from their predicted intersection passing times. In Figure 8b, red colored circular points represent those outliers identified from the dataset and the dashed line represents the quadratic polynomial curve fitted to the outliers. Hence, if we use the quadratic curve model on top of the Hougen–Watson model to predict the intersection passing time, then the predicted time will be long enough for *most* RLRs so that they can completely clear an intersection within the time interval, which is much safer than using the Hougen–Watson model alone.

**Figure 8.** All-red signal extension time for RLRs: Actual vs. Prediction.

However, as one can notice, it may be too conservative sometimes to use only one prediction model to predict intersection passing times for all types of RLRs. For a certain class of RLRs, the predicted intersection passing time predicted by the model may be unnecessarily longer than needed for such RLRs. Thus, for better traffic efficiency, we develop and use different prediction models for different RLR classes to predict the intersection passing time more precisely. Figure 9 shows the prediction model of each RLR class developed by the same framework of using Hougen–Watson model and quadratic polynomial curve fitting. Having these four different prediction models corresponding to each RLR types, it is now possible to determine the necessary all-red signal extension time more effectively than using only one prediction model once the type of RLR of an incoming vehicle is correctly classified by the MC-DCNN classifier. Table 4 shows the values of the Hougen–Watson model parameters determined by the Levenberg–Marquardt algorithm for each prediction model and also the values of the coefficients for the quadratic polynomial curve fitting of outliers. Regarding the values of the quadratic polynomial curve fitting shown in the table, *p*<sup>1</sup> represents the second-order coefficient, *p*<sup>2</sup> is the first-order coefficient, and *p*<sup>3</sup> is the polynomial constant of a quadratic polynomial equation.

**Figure 9.** Prediction models of all red-light extension time for each RLR class type.

**Table 4.** Coefficients of prediction models for all-red signal extension time. (Mixed RLR represents the prediction model shown in Figure 8 and others are corresponding to models shown in Figure 9).


To evaluate the accuracy of the proposed multi-class intersection passing time prediction framework compared to the case of using only one prediction model, the *mixed* RLR model in Table 4, we use the following standard deviation of residual *σest* defined as

$$
\sigma\_{est} = \sqrt{\frac{\sum (y - \hat{y})^2}{N}} \tag{3}
$$

where *N* is the number of RLRs, *y* is the actual intersection passing time of an RLR, and *y*ˆ is the predicted intersection passing time of the RLR. Table 5 is the result of the prediction accuracy of intersection passing time of RLRs measured by *σest* for the two prediction models in the case where the prediction time after yellow onset is 3 s. As shown in the result, the proposed multi-class model has a much smaller residual standard deviation of residual compared to the case of using the mixed RLR model. This result shows that the proposed model can predict the time more accurately when an RLR will completely cross an intersection.



Once the intersection passing time of an RLR *y*ˆ is estimated precisely, then it is relatively straightforward to determine the necessary all-red signal extension time for the RLR. A simple strategy for dynamic all-red signal extension control is as follows: If the length of the all-red signal is greater than 1 s, which is the default all-red signal length, then the length of the all-red signal is set to *y*ˆ unless it is larger 5 s. In case the value of *y*ˆ exceeds 5 s, then the all-red signal length is set to 5 s according to the standard.

#### **8. Conclusions**

In this paper, we proposed a system that dynamically controls all-red signal length based on the driving characteristics of Red Light Runner (RLR) vehicles to improve the overall intersection safety and efficiency. The main components of the proposed system are the Multi-Channel Deep Convolutional Neural Networks (MC-DCNN) classifier that classifies an approaching vehicle into five classes according the vehicle's driving characteristics and the multi-level nonlinear regression model that can predict the necessary all-red signal extension time more accurately. We used the Dynamic Time Wrapping (DTW) and the Hierarchical Clustering Analysis (HCA) to carefully determine the types of clusters to be classified via MC-DCNN so that each class can be reasonably distinguishable by their driving characteristics. As a result of this multi-step classification and regression process, we validated that the proposed system can predict the actual intersection passing time of RLRs with very small prediction error and thereby it can improve both the safety as well as the efficiency of intersection traffic. In the future, we will build vehicle surveillance systems at some sections of real road intersections to collect real traffic data. Synchronized data of vehicle data and signal information will be collected, and the proposed system will be verified in a real environment. In addition, we will conduct a quantitative assessment of intersection safety and economic loss through the analysis of traffic flow due to signal extension.

**Author Contributions:** Conceptualization, H.J. and K.-D.K.; methodology, software, simulation, H.J.; investigation, formal analysis, validation, simulation, writing—original draft preparation, S.K.K.; supervision, project administration, writing—review and editing, K.-D.K. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported by the National Research Foundation of Korea(NRF) grant funded by the Korea government(MSIT) (No.2019R1F1A1059496) and the DGIST R&D Program of the Ministry of Science and ICT (20-CoE-IT-01).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
