A Survey of Side-Channel Leakage Assessment

Wang, Yaru; Tang, Ming

doi:10.3390/electronics12163461

Open AccessReview

A Survey of Side-Channel Leakage Assessment

by

Yaru Wang

^1,2 and

Ming Tang

^1,2,*

¹

School of Cyber Science and Engineering, Wuhan University, Wuhan 430072, China

²

Key Laboratory of Aerospace Information Security and Trusted Computing, Ministry of Education, Wuhan University, Wuhan 430072, China

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(16), 3461; https://doi.org/10.3390/electronics12163461

Submission received: 10 July 2023 / Revised: 9 August 2023 / Accepted: 11 August 2023 / Published: 15 August 2023

(This article belongs to the Special Issue Computer-Aided Design for Hardware Security and Trust)

Download

Browse Figures

Versions Notes

Abstract

:

As more threatening side-channel attacks (SCAs) are being proposed, the security of cryptographic products is seriously challenged. This has prompted both academia and industry to evaluate the security of these products. The security assessment is divided into two styles: attacking-style assessment and leakage detection-style assessment. In this paper, we will focus specifically on the leakage detection-style assessment. Firstly, we divide the assessment methods into Test Vector Leakage Assessment (TVLA) and its optimizations and summarize the shortcomings of TVLA. Secondly, we categorize the various optimization schemes for overcoming these shortcomings into three groups: statistical tool optimizations, detection process optimizations, and decision strategy optimizations. We provide concise explanations of the motivations and processes behind each scheme, as well as compare their detection efficiency. Through our work, we conclude that there is no single optimal assessment scheme that can address all shortcomings of TVLA. Finally, we summarize the purposes and conditions of all leakage detection methods and provide a detection strategy for actual leakage detection. Additionally, we discuss the current development trends in leakage detection.

Keywords:

leakage assessment technology; side channel attack; TVLA; leakage detection

1. Introduction

The pervasive nature of information technology has permeated all aspects of work and life. As malicious information security incidents like “Eternal Blue” [1] and “Bvp47” [2] continue to emerge, information security has garnered significant attention. The cryptographic products are commonly known as products that utilize cryptography technology. The security is the fundamental attribute of cryptographic products. However, various cryptographic analysis technologies, such as traditional cryptographic analysis and SCAs, can impact the security of these products. The traditional cryptographic analysis technology mainly includes techniques like differential cryptanalysis [3], linear cryptanalysis [4], correlation analysis [5], etc. On the other hand, SCA techniques encompass power analysis attacks [6] (such as simple power analysis (SPA) [7], differential power analysis (DPA) [6], correlation power analysis (CPA) [8], mutual information analysis (MIA) [9], and SCA-based deep learning [10,11,12,13,14,15,16]), timing attacks [17], fault-based attacks [18], cache attacks [19], etc. Consequently, evaluating the security of cryptographic products has become a crucial task. Two popular security certification standards, namely, Common Criteria (CC) [20] and FIPS [21], have been established to assess the security of cryptographic products. These security certification standards employ two distinct assessment methods: evaluation-style testing (also known as attacking-style assessment) and conformance-style testing (also known as leakage detection-style assessment) [22].

The attacking-style assessment mainly uses various SCA techniques to obtain the information of keys and evaluate the products against SCAs. The attacking-style assessors can directly obtain the key by executing SCAs. The evaluation results facilitate the calculation of security metrics for encryption algorithms and the identification of vulnerabilities within these algorithms and provide guidance for implementing algorithm protection strategies. The attacking-style assessment provides the advantages of rigorous evaluation intensity and easily accessible evaluation results. The attacking-style assessment holds great significance as a method for evaluating side-channel security. However, the use of the attacking-style assessment in primary security assessments is considered unsuitable due to its reliance on SCA technologies, which necessitate evaluators with advanced SCA techniques. This reliance increases both the time and sample complexity of the assessment, subsequently slowing down the assessment speed. Consequently, it fails to keep pace with the rapid cycle of product innovation [23,24,25,26,27]. Instead, the leakage detection-style assessment is proposed as a preliminary method to evaluate the security of cryptographic products. The leakage detection-style assessment primarily relies on information theory or statistical hypotheses to analyze side-channel measurements and determine the presence of any side-channel leakage. The objective of the leakage detection-style assessment is to evaluate whether the device can pass the testing rather than recover the key itself. This type of assessment can be categorized into two main types: leakage assessment based on information theory and assessment based on statistical hypothesis. In recent years, the leakage assessment based on statistical hypotheses has been widely adopted due to its superior efficiency and effectiveness [27]. Various works focusing on leakage assessment using statistical hypotheses have been proposed [23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38].

In this paper, we study the works of side-channel leakage assessment and provide succinct accounts of motivations and detection processes behind these assessment methods. We also compare the efficiency of different detection methods and discuss the future development of leakage assessments.

The main contributions of this paper are as follows:

(1): We analyze the works of side-channel leakage assessment and classify the leakage detection-style assessment works into two categories: the technology of TVLA and optimizations of TVLA. Additionally, we identify the shortcomings of TVLA. Due to the TVLA’s flaws of statistical tool, detection process, and decision strategy, we dividedTVLA’s optimization schemes into three groups: the optimization of statistical tool, detection process, and decision strategy. Furthermore, we provide a brief description of the motivation and detection process for each optimization and compare their detection efficiency.
(2): Due to the lack of a unified and comprehensive leakage detection assessment method that can address all the TVLA’s shortcomings, as well as the variation in optimization methods based on detection purposes and conditions, we present a summary on how to select a suitable leakage detection assessment method depending on specific detection purposes and conditions. Moreover, considering the current state of leakage detection assessment, we discuss potential future trends in this field.

The structure of this paper is as follows. Section 2 provides a brief description of the process, methods, metrics, and shortcomings of the attacking-style assessment. Section 3 focuses on the leakage detection-style assessment and describes the development process, metrics, and detection objectives. Section 4 mainly focuses on the leakage detection assessment based on statistical hypotheses and describes TVLA and its optimization methods. Section 5 highlights the relationship between the attacking-style assessment and leakage detection-style assessment. In Section 6, the current status and future development trends of leakage detection-style assessment are discussed. Finally, Section 7 presents the conclusion of this review.

2. The Attacking-Style Assessment

2.1. The Assessment Process of Attacking-Style Assessment

The attacking-style assessment primarily utilizes the device’s side-channel information to recover the key and assess the product’s security against SCAs. The CC [20] standard serves as a reference criterion for the attacking-style assessment. The estimators can choose any SCA method from the list of threats to conduct SCAs and assess the product’s security level. Figure 1 illustrates the process of the attacking-style assessment.

In the attacking-style assessment, the evaluator assumes the role of an attacker with prior knowledge of the implementation of cryptographic algorithms.

2.2. The Methods of Attacking-Style Assessment

Because the attacking-style assessment heavily relies on SCA technologies, the evaluators must have a proficient understanding of SCA technologies. SCA techniques can be categorized into two groups: profiled attack (PA) and non-profiled attack (NPA). The attackers using PAs must first construct a device model before conducting SCAs. The current methods for PAs include Template Attack (TA) [39,40,41] and the profiled attack-based deep learning [14,42,43]. On the other hand, NPA does not require a model and solely relies on side-channel measurements for conducting SCAs. Common NPA methods include DPA [7], CPA [8], MIA [9], etc.

2.2.1. The Profiled Attack

(1): The Template Attack

Time series are the typical representation of side-channel measurements. The attackers utilize these measurements along with their model to conduct side-channel power analysis. TAs are one of the earliest approaches to PAs [39,40,41]. In TAs, it is assumed that the attacker has access to the same device as the target device, enabling them to encrypt any plaintext and collect corresponding side-channel power traces. TAs include two stages: the stage of constructing the template and the stage of matching the template.

In the stage of constructing the template, the main objective is to extract trace features and build the template. Assuming we have

n

traces designated as

L = \{l_{1}, l_{2}, \dots, l_{n}\}

, we divide

L

into

K

groups noted as

L_{1}, L_{2}, \dots, L_{K}

, where

L_{i}

is the

i

-th group with

k_{i}

. The template of

k_{i}

is noted as

(μ_{i}, C_{i})

, where

μ_{i}

represents the mean vector and

C_{i}

is the covariance matrix.

μ_{i} = \frac{\sum_{l_{j} \in A_{i}} l_{j}}{n_{i}}, C_{i} = \frac{1}{n_{i} - 1} \sum_{l_{j} \in A_{i}} (l_{j} - μ_{i}) {(l_{j} - μ_{i})}^{'} .

(1)

During the stage of matching the template, the attacker utilizes the traces to calculate the probability between the

t r a c e

and the templates

(μ_{i}, C_{i})

.

P (t r a c e; (μ_{i}, C_{i})) = \frac{e x p (- 0.5 \times {(t r a c e - μ_{i})}^{'} C_{i}^{- 1} (t r a c e - μ_{i}))}{\sqrt{{(2 π)}^{2} \det (C_{i})}} .

(2)

If

P (t r a c e; (μ_{j}, C_{j})) > P (t r a c e; (μ_{i}, C_{i})), \forall i \neq j

, then we get

k_{g u e s s} = k_{j}

.

(2): The profiled attack based on deep learning

In recent years, deep learning technology has emerged as a popular alternative method in PAs. Specifically, the PA based on deep learning [14,42,43] utilizes the multi-layer perceptron and convolutional neural networks to construct the templates and conduct SCAs.

The PA based on deep learning requires two independent trace sets [42]. One set is the training set, which is used to construct the template. Attackers need to have the keys, plaintexts, and traces for the training set. During the training process, the attackers use a minimum loss function to train the model, allowing the template to achieve better results. The other one is the validation set, which is used to carry out attacks.

Compared to traditional approaches like TAs, the PA based on deep learning can overcome the noise assumption and offers a more efficient and simplified process without extensive preprocessing. However, there are two shortcomings. Firstly, the metrics of deep learning are challenging to apply to SCA scenarios and may provide misleading results [42,44]. Secondly, the effectiveness of PAs based on deep learning will decrease significantly when facing imbalanced data.

2.2.2. The Non-Profiled Attack

The fundamental assumption of NPAs is that the attacker cannot obtain the same device as the target device but has access to an unlimited number of side-channel traces.

(1): Differential Power Analysis

The key information is extracted by the distinguisher in NPA. Let

X = {(X_{1}, \dots, X_{D})}^{'}

be the input, where

X_{i}

represents the

i

-th group. The attacker collects the traces from the process of encrypting

D

groups’ data. The trace of

X_{i}

is denoted as

l_{i}^{'} = (l_{i, 1}, \dots, l_{i, n_{l}})

, where

n_{l}

represents the length of the trace. The guessing key space is

K = \{k_{1}, \dots k_{K}\}

, where

|K|

is the number of all possible keys. For every key, the input

X

and the intermediate value

V = f (X, k)

are considered. For input

X

and all guessing keys

K

, the trace matrix

L

and intermediate value matrix

V

are

L =  [\begin{matrix} l_{1, 1} & l_{1, 2} & \dots & l_{1, j} & l_{1, n_{l}} \\ l_{2, 1} & l_{2, 2} & \dots & l_{2, j} & l_{1, n_{l}} \\ \dots & \dots & \dots & \dots & \dots \\ l_{i, 1} & l_{i, 2} & \dots & l_{i, j} & l_{i, n_{l}} \\ l_{D, 1} & l_{D, 2} & \dots & l_{D, j} & l_{D, n_{l}} \end{matrix}], V =  [\begin{matrix} v_{1, 1} & v_{1, 2} & \dots & v_{1, j} & v_{1, K} \\ v_{2, 1} & v_{2, 2} & \dots & v_{2, j} & v_{1, K} \\ \dots & \dots & \dots & \dots & \dots \\ v_{i, 1} & v_{i, 2} & \dots & v_{i, j} & v_{i, K} \\ v_{D, 1} & v_{D, 2} & \dots & v_{D, j} & v_{D, K} \end{matrix}]

Map the intermediate value matrix

V

to matrix

H

= and calculate the correlation coefficient between

h_{i}

of

H

and

l_{j}

of

L

. The correlation coefficients

r_{i, j}

are stored in matrix

R

.

H =  [\begin{matrix} h_{1, 1} & h_{1, 2} & \dots & h_{1, j} & h_{1, K} \\ h_{2, 1} & h_{2, 2} & \dots & h_{2, j} & h_{1, K} \\ \dots & \dots & \dots & \dots & \dots \\ h_{i, 1} & h_{i, 2} & \dots & h_{i, j} & h_{i, K} \\ h_{D, 1} & h_{D, 2} & \dots & h_{D, j} & h_{D, K} \end{matrix}], R =  [\begin{matrix} r_{1, 1} & r_{1, 2} & \dots & r_{1, j} & r_{1, n_{l}} \\ r_{2, 1} & r_{2, 2} & \dots & r_{2, j} & r_{2, n_{l}} \\ \dots & \dots & \dots & \dots & \dots \\ r_{i, 1} & r_{i, 2} & \dots & r_{i, j} & r_{i, n_{l}} \\ r_{K, 1} & r_{K, 2} & \dots & r_{K, j} & r_{K, n_{l}} \end{matrix}]

r_{i, j} = \frac{\sum_{d = 1}^{D} (h_{i, d} - \bar{h_{i}}) \cdot ((l_{d . j} - \bar{l_{j}})}{\sqrt{\sum_{d = 1}^{D} {(h_{i, d} - \bar{h_{i}})}^{2} \cdot \sum_{d = 1}^{D} {(l_{d, j} - \bar{l_{j}})}^{2}}} .

(3)

We use correlation coefficients to assess the linear correlation between

l_{j}

and

h_{i}

, where

i = 1, \dots, K, j = 1, \dots, n_{l}

in DPA. As

r_{i, j}

increases, the level of correspondence between

l_{j}

and

h_{i}

also augments. By identifying the highest value of

r_{i, j}

, the assailant can successfully retrieve the key.

(2): Correlation Power Analysis

Correlation Power Analysis (CPA) is a variant of DPA that exploits the correlation between the power traces

L

and the leakage model

F

to conduct the attack. Assuming the leakage model is denoted by

F

, we map the intermediate value matrix

V = f (X, K)

to the leakage value noted as

G_{K} = F (V)

. Here,

X

represents the input,

K

denotes the key space, and

f

stands for the cryptographic function. Typically,

F

represents the leakage model based on either the Hamming weight or Hamming distance. Similarly,

G_{K}

corresponds to the Hamming weights or Hamming distances of

V

.

The correlation coefficient between the power traces

L

and

G_{K}

is as follows:

p_{L, G_{K}} = \frac{E (L \cdot G_{K}) - E (L) \cdot E (G_{K})}{σ_{L} \cdot σ_{G_{K}}},

(4)

where

E (L), E (G_{K}), E (L \cdot G_{K})

are the expectations of

L, G_{K}, L \cdot G_{K}

and

σ_{L}, σ_{G_{K}}

are the variances of

L, G_{K}

, respectively.

The attackers iterate through the space of possible keys, calculating the correlation coefficient between

G_{K}

and

L

to determine whether a key guess is correct. The correlation coefficient of the correct guess key is higher than that of the incorrect guess key. The attackers identify the guess key with the maximum correlation coefficient as the correct key, using the maximum likelihood estimation method.

(3): Mutual Information Analysis

The attacker of MIA uses mutual information or entropy to assess the correlation between the leakage and the intermediate value or between the intermediate value and side-channel measurements in [9]. The fundamental concepts of entropy and mutual information are as follows:

Let

X = (X_{1}, X_{2}, \dots, X_{n})

be the set of discrete random variables, then the entropy of

X

is

H (X) = \sum_{x \in X} - p (x) l o g p (x)

(5)

where

p (x)

is the probability distribution of variables.

Let

X = (X_{1}, X_{2}, \dots, X_{n}), Y = (Y_{1}, Y_{2}, \dots, Y_{n})

be the set of discrete random variables, then the conditional entropy

H (X | Y)

of

X

under

Y

is

H (X | Y) = - \sum_{y \in Y} p (y) l o g p (y) \sum_{x \in X} p (x | y) l o g p (x | y) .

(6)

The mutual information between

X

and

Y

is

I (X; Y) = H (X) - H (X | Y)

.

Let

K, X,

and

V

be the key, plaintext, and intermediate value, respectively. The correct key is denoted as

k_{c}

, and the leakage function

L (V) = f (X, K)

is continuous. Let

l_{i} = f (X_{i}, K_{c}), where 1 \leq i \leq N

. For a given key

k

, the intermediate values can be obtained using

M = (X_{i}, k)

, where

M

represents discrete mutual information. When the attack is successful,

m a x_{k \in K} (|D (M (p, k), L)|) = k_{c}

,

D

is the distinguisher.

In MIA, the distinguisher was proposed as

I (L, M (X, k)) = H (J) - H (I | M (X, k)) .

(7)

Different keys result in different values of mutual information. The average mutual information is associated with the conditional entropy

H (I | M (X, k))

. A stronger correlation between the measured

L

and

M (X, k)

indicates that the guessed key

k_{g u e s s}

is the correct key.

The development of the attacking-style assessment relies entirely on SCA methods. Essentially, the attacking-style assessment comprises attacks on the targeted devices. Consequently, the assessment itself is considered as an attack.

2.3. The Metrics of Attacking-Style Assessment

Due to the assessment evaluation being an attack, the attack metrics are the assessment metrics for attacking-style assessment. Distinguisher scores are commonly employed to sort the candidate key

k_{g u e s s}

in an SCA. The position of the correct key

k_{c}

in the sorting results is called the key ranking, noted as

r a n k (k_{c} | L, X)

. The metrics based on the key ranking are defined as follows.

(1) The number of samples: Find a minimum positive integer 𝑁, when the sample size

|L| \geq N

, and there is

r a n k (k_{c} | L, X) \leq |{\hat{K}}_{c}| .

(8)

where

|{\hat{K}}_{c}|

is key ranking of

k_{c}

. Especially, when the

|{\hat{K}}_{c}| = 1

,

k_{c} = {\hat{K}}_{c}

, then

r a n k (k_{c} | L, X) = 1

.

(2) Success rate: The success rate of the side-channel attack is the probability of successfully recovering the correct key.

(3) Guessing entropy: The guessing entropy is the mathematical expectation of key ranking

G E_{L, X} = E (r a n k (K_{c} | L, X)) .

(9)

2.4. The Advantages and Shortcomings of Attacking-Style Assessment

With the advancement of side-channel technology, there has been a rise in the proposal of various side-channel attack methods. Evaluators can select an appropriate side-channel attack method to evaluate cryptographic algorithms, considering distinct encryption implementations and attack conditions. Through evaluating the actual results of side-channel attacks, one can determine the security level of the encryption implementation. The attacking-style assessment technique is characterized by its high level of aggression and extensive evaluation, allowing evaluators to directly acquire information about key and design weaknesses. The attacking-style assessment enables a thorough comprehension of system vulnerabilities and weaknesses. Through simulating real-world attack scenarios, it offers valuable insights into the efficacy of security measures and aids in identifying potential areas for improvement. It facilitates the identification of previously unknown vulnerabilities. The capability to analyze attacks in a controlled environment enables the development and implementation of efficient countermeasures. Thus, the attacking-style assessment is better suited for conducting comprehensive analysis and evaluation of cryptographic algorithms, subsequently facilitating vulnerability analysis and the establishment of protective measures.

The attacking-style assessment is widely utilized in the security assessment of cryptographic products. However, there are several limitations to its applicability.

First, because new SCA methods are frequently proposed, it is crucial to periodically update the list of attack methods. However, CC standards generally apply to high-security products like bank smart cards and passport ICs. The time and computational complexity involved in security evaluation make it difficult to keep pace with the innovation cycle of new security products [26,45,46].

Second, the evaluators of attacking-style assessments must possess exceptional expertise in SCA methods and measurement techniques [23].

Third, due to the different principles of SCAs, evaluators need to perform multiple SCAs to calculate security metrics. The increasing number of attacks inevitably leads to an increase in computational and time complexity [31,39]. Therefore, the attacking-style assessment is not suitable for the primary security evaluation of cryptographic algorithms. Instead, the leakage detection-style assessment is proposed as a preliminary method to evaluate the security of cryptographic products.

3. The Leakage Detection-Style Assessment

3.1. The Goals of Leakage Detection-Style Assessment

The leakage detection-style assessment is conducted by the laboratory to provide the security certification, or conducted by the designers during the design period to highlight and address the potential issues before the product is launched into the market. Consequently, different stages of assessment have different goals. There are four different intentions [47]: certifying vulnerability, certifying security, demonstrating an attack, and highlighting vulnerabilities.

(1) Certifying vulnerability: This involves identifying at least a leakage point in traces. It is crucial to minimize the false positive rate.

(2) Certifying security: The goal here is to find no leakages after thorough testing. In this case, the false negatives become a concern. It is important to design the tests with “statistical power” to ensure a large enough sample size for detecting effects with reasonable probability. Moreover, all possible intermediates and relevant higher-order combinations of points should be targeted.

(3) Demonstrating an attack: The objective is to map a leaking point (or tuple) to its associated intermediate state(s) and perform an attack. The reporting of the outcomes or projections derived from these attacks is of interest. The false positives are undesirable as they represent wasted efforts of the attacker.

(4) Highlighting vulnerabilities: The purpose is to map all exploitable leakage points to intermediate states in order to guide designers in securing the device. This has similarities to certifying security, as both require exhaustive analysis. The false negatives are of greater concern than false positives, as false negatives indicate unfixed side-channel vulnerabilities.

3.2. The Process of Leakage Detection-Style Assessment

In leakage detection, the evaluators have the ability to control both input and output variables and obtain the side-channel measurements. The process of leakage detection-style assessment primarily encompasses four stages: collecting power traces, categorizing power traces, calculating the statistical moment, and determining leakage. The process of leakage detection-style assessment is shown in Figure 2.

Compared with the attacking-style assessment, the leakage detection-style assessment uses the statistical hypothesis or information theory to detect whether there is leakage. The evaluator does not consider specific attack methods, leakage models, or the implementation of encryption algorithms and does not require accurate extraction of leakage characteristics or the key recovery. Consequently, the leakage detection-style assessment becomes an ideal method for primary security assessments due to the lowered technical threshold and reduced evaluation time.

3.3. The Development of Leakage Detection-Style Assessment

In this paper, we categorize the development of leakage detection-style assessment into three stages: the phase of proposing the leakage assessment concept, the phase of forming leakage assessment methods, and the phase of optimizing the leakage assessment methods. Figure 3 provides a visual representation of these development stages.

In the phase of proposing the leakage assessment concept, Corona (2001) proposed the idea of security assessment [48]. The failed result indicates the presence of side-channel leakage, whereas the pass result does not imply that there is no leakage but rather indicates that the leakage has not been identified at the specified confidence level

α

. In [49], a framework for the side-channel security assessment was presented, categorizing security assessments into attacking-style and leakage detection-style assessments. This paper primarily concentrates on the leakage detection-style assessment.

In the phase of forming leakage assessment methods, two groups of leakage detection-style assessment methods have emerged based on different theories: the information theory and the statistical hypothesis. In 2010 and 2011, Chothia proposed two leakage assessment methods: one based on discrete mutual information (DMI) [50] and the other based on continuous mutual information (CMI) [51]. Both the DMI and CMI methods utilize information entropy as a testing tool to assess the possibility of side-channel leakage. Additionally, Gilbert et al. utilized the statistical hypothesis as a testing tool in their work [52]. They divided the traces into two groups and performed leakage assessment on the Advanced Encryption Standard (AES) using a t-test [52]. In 2013, based on the research by Gilbert et al. [52], Becker et al. proposed the Test Vector Leakage Assessment (TVLA) technique [53]. They divided the traces into two groups: fixed plaintext traces vs. random plaintext traces. Then, they performed Welch’s t-tests on these two groups to detect any mean difference between the trace sets of fixed plaintext and random plaintext. In 2013, Mather et al. conducted a comparison of the detection efficiency between DMI and CMI with TVLA [27]. The results demonstrated that TVLA outperformed DMI and CMI in terms of detection efficiency. Consequently, the leakage assessment based on the statistical hypothesis has been developed in recent years. In 2015, Moradi et al. summarized TVLA techniques proposed by Gilbert and Becker in [23] and studied leakage detection in various scenarios. The non-specific TVLA technology has been widely applied in leakage assessment and is commonly regarded as a preliminary assessment technique in both industry and academia due to its simplicity, efficiency, and versatility.

However, in order to address the shortcomings of TVLA [22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,47] (the detailed description of TVLA’s drawbacks can be found in Section 4), improve the detection efficiency, reliability of results, and quantify side-channel vulnerability, the researchers have proposed a variety of leakage assessment schemes to enhance the TVLA.

In the phase of optimizing leakage assessment methods, there are three main aspects to consider. Firstly, the optimization of statistical tools aims to replace the t-test of TVLA with alternative statistical tools (such as the paired t-test [26],

χ^{2}

-tests [28], Hotelling

T^{2}

-tests [29], KS tests [30], ANOVA [31], deep learning [36,37], etc.). Secondly, the optimization of the detection process involves improving the current TVLA detection process or proposing a new one [32,33,34]. Finally, the optimization of the decision strategy focuses on introducing a new decision strategy, such as the HD strategy [35], to determine the stage of leakage.

4. The Leakage Assessment Based on Statistical Hypothesis

This section provides an overview of two perspectives on leakage assessment technologies: the TVLA technology and the TVLA’s optimization schemes.

4.1. The Test Vector Leakage Assessment

In 2013, the Cybersecurity Research Institute (CRI) introduced the TVLA as a standardized approach for detecting side-channel leakages. This section emphasizes the process of detection, the metrics used for detection, and discusses the limitations of TVLA.

4.1.1. The TVLA Technology

(1): The detection process of TVLA

The detection process of TVLA is as follows.

In the stage of collecting power traces,

N

different inputs

X

are used to collect traces

L

from the execution process. Let

l = \{l_{1}, \dots, l_{i}, \dots, l_{n_{l}}\}

be a trace, where

l_{i}

represents the measurements of the

i

-th point, and

n_{l}

is the length of traces.

In the stage of categorizing power traces, the trace set

L

is divided into two groups: fixed plaintext trace set

L_{A}

and random plaintext trace set

L_{B}

. It is assumed that

L_{A}

and

L_{B}

obey the normal distribution

N (μ_{A}, σ_{A}^{2})

and

N (μ_{B}, σ_{B}^{2})

, respectively. The cardinality, sample mean, and sample variance of

L_{A}

are denoted as

(n_{A}, \bar{x_{A}}, s_{A}^{2})

, while those of

L_{B}

are denoted as

(n_{B}, \bar{x_{B}}, s_{B}^{2})

.

In the stage of calculating the statistical moment, the null hypothesis

H_{0}

states that there is no side-channel leakage, while the alternative hypothesis

H_{1}

suggests the presence of leakage. Welch’s t-test is employed to determine the mean difference between

L_{A}

and

L_{B}

. Based on the assumption of

H_{0}

, we calculate the statistical moment

T_{i}

and probability

P_{i}

of accepting

H_{0}

.

P_{i} - v a l u e = P (|T_{i}| > |T_{t h}|) .

(10)

In the stage of determining leakage, if

|T_{i}| > |T_{t h}|

, then the null hypothesis

H_{0}

is rejected, and it can be concluded that there is side-channel leakage.

(2): The statistical tool of TVLA

The statistical tool Welch’s t-test is used to test the mean difference between

L_{A}

and

L_{B}

in TVLA. The null hypothesis

H_{0}

and alternative hypothesis

H_{1}

in Welch’s t-test [33] are as follows:

H_{0} : μ_{A} = μ_{B}, H_{1} : μ_{A} \neq μ_{B}

The statistical moment

|T_{μ}|

and freedom degree

v

of the t-test are calculated as follows.

|T_{μ}| = \frac{\bar{x_{A}} - \bar{x_{B}}}{\sqrt{\frac{s_{A}^{2}}{n_{A}} + \frac{s_{B}^{2}}{n_{B}}}}, v = \frac{{(\frac{s_{A}^{2}}{n_{A}} + \frac{s_{B}^{2}}{n_{B}})}^{2}}{\frac{{(\frac{s_{A}^{2}}{n_{A}})}^{2}}{n_{A} - 1} + \frac{{(\frac{s_{B}^{2}}{n_{B}})}^{2}}{n_{B} - 1}} .

(11)

The probability

|p|

value of accepting hypothesis

H_{0}

is calculated as follows using the probability density function (PDF).

f (t, v) = \frac{Γ (\frac{v + 1}{2})}{\sqrt{π v} Γ (\frac{v}{2})} {(1 + \frac{t^{2}}{v})}^{- \frac{v + 1}{2}}, P = 2 \int_{|T_{μ}|}^{\infty} f (t, v) d t,

(12)

where

Γ (x) = \int_{0}^{\infty} t^{x - 1} e^{- t} d t

is the gamma function.

The threshold 4.5 [54,55] is employed to determine the acceptance of

H_{0}

. If

| T_{μ} |〉 4.5

, the hypothesis

H_{0}

is rejected. When

v > 1 000

,

P = 2 \times f (4.5, v) < 10^{- 5}

[33], and this implies that the probability of accepting hypothesis

H_{0}

is less than 0.00001, while the probability of rejecting hypothesis

H_{0}

is greater than 0.99999, indicating the presence of side-channel leakage in the device.

(3): The decision strategy of TVLA

Due to TVLA being a leakage assessment method for univariate data, when the evaluator obtains the traces of length

n_{L}

, TVLA is applied to each sample point of the traces. The assessor can obtain

n_{L}

detection results, and the assessment decision is obtained by combing these results. The

m i n - P

strategy is a common decision strategy used in TVLA. Let

L =  [l_{1}, \dots, l_{n_{L}}]

be set of traces, where

l_{i} = {[l_{i, 1}, \dots, l_{i, j}, \dots l_{l, N}]}^{’}

. The

T_{μ_{i}}

represents the statistical moment value of

l_{i}

at

i

.

When TVLA is applied to the long traces, the assessor actually conducts multiple (

n_{L}

times) tests. If any of the tests reject

H_{0}

, it indicates the presence of side-channel leakage. In other words, the statistical moment is compatible with

m a x_{1 \leq i \leq n_{L}} |T_{μ_{i}}| \geq t h

, or the minimum

p

-value is less than the threshold

α_{t h}

. This indicates a leakage, meaning that the

m i n - P

strategy of TVLA only uses one test result (the minimum

p

-value) to make a decision regarding long traces.

4.1.2. The Assessment Metrics of TVLA

Due to the fact that TVLA involves the statistical analysis of random variables, it is possible for the detection results to contain errors. Therefore, it becomes imperative to evaluate the detection results. Commonly used metrics are employed to assess the effectiveness of detection methods.

(1) The number of samples: The minimum sample size required to exceed the threshold for statistical moments is an assessment metric used in TVLA. This metric is frequently utilized to compare the detection effectiveness among various assessment methods. In identical conditions, a smaller sample size indicates a higher degree of assessment effectiveness [24,25,26,27,28,29,30,31].

(2) The false positive and false negative: The two types of errors commonly encountered in hypothesis testing are false positives [24] and false negatives [47]. A false positive occurs when the null hypothesis is true, but it is rejected by a t-test, leading to an incorrect conclusion. In the context of leakage detection, a false positive refers to a situation where the device does not have any leaks, yet the TVLA results indicate otherwise. Conversely, a false negative denotes a Type II error, which occurs when the null hypothesis is false, but the t-test fails to reject it. The rate of Type II errors is denoted as

β

. During leakage assessment, the assessor aims to control the false positive rate at the specified significance level

α

.

(3) The effect size: The effect size

ζ

is an indicator [22,47] employed to assess the effectiveness of leakage detection.

Cohen’s d [56,57] is a commonly used effect size for comparing differences among groups, mainly applied to t-tests to compare the standard difference between two means. Cohen’s

d

is computed as follows:

d = \frac{\bar{x_{A}} - \bar{x_{B}}}{\sqrt{\frac{(n_{A} - 1) s_{A}^{2} + (n_{B} - 1) s_{B}^{2}}{n_{A} + n_{B} - 2}}},

(13)

where

\bar{x_{A}}, \bar{x_{B}}

are the sample mean,

s_{A}^{2}, s_{B}^{2}

are the sample variance, and

n_{A}, n_{B}

are the cardinalities. Cohen establishes the threshold as the criterion for judging the effect size [58]. According to his classification, when

d \leq 0.2

, the effect size is considered “small”, and when

d \leq 0.8

, the effect size is considered “large”.

(4) The power: The power is an essential metric that indicates the ability of the assessor to detect a difference at the significance level

α

, and it is noted as

1 - β

. The power should not be less than 75% and is typically required to reach 80% or 90%. The relationship among the variance

σ_{1}^{2}

,

σ_{2}^{2}

, the power

1 - β

, the significance level α, the effect size

ζ

, and the number of samples

N

is as follows:

N = 2 \cdot \frac{{(T_{α / 2} + T_{β})}^{2} \cdot (σ_{1}^{2} + σ_{2}^{2})}{ζ^{2}},

(14)

where

ζ = μ_{1} - μ_{2}

, and

T_{α / 2}

and

T_{β}

are the statistical moments. Equation (14) allows us to obtain any one of the significance level, effect size, or power [47].

(5) The observed power: Because power is a theoretical parameter that cannot be directly obtained from samples, in order to evaluate the reliability of assessment results, the observed power (OP) is proposed in [59] for given

α

and

N

. The OP can be considered an approximation of power. If the probability of accepting the null hypothesis

P > α

, the assessor can determine the reliability of assessment results by checking whether OP is greater than

0.8

. Additionally, OP can serve as a means to compare the effectiveness of different assessment methods, while keeping the sample size

N

constant.

4.1.3. The Drawbacks of TVLA

Generally, TVLA is based on hypothesis testing to determine whether the side-channel measurements expose any secret information, which can help simplify the security assessment process [25,60,61,62]. However, this complexity reduction is accompanied by an increase in false positives or false negatives [23,36]. The drawbacks of TVLA are as follows:

(1): Difficulty interpreting negative outcomes

Formally, a statistical hypothesis test can either reject the null hypothesis or fail to reject it. However, it cannot prove or accept the null hypothesis. If the statistical hypothesis test fails to reject the null hypothesis, the security assessment agency must demonstrate that their assessment method is fair. Unfortunately, due to the sample limitations or time constraints, varying levels of expertise, and poor equipment quality, the fairness of leakage assessment may be undermined. As a result, when the negative outcomes occur, it becomes challenging to explain and provide evidence for the fairness of TVLA.

(2): Unreliability of positive outcomes

The TVLA is commonly utilized for univariate analysis. However, in actual detection scenarios, multiple tests are necessary. To illustrate this, let us assume that the probability of false positive is denoted as

α

. The length of traces is represented as

n_{l}

, and there are

n_{l}

independent tests. Consequently, the probability of rejecting the null hypothesis for a single test is given by

α_{a l l} = 1 - {(1 - α)}^{n_{l}}

. Ding has emphasized that the threshold of 4.5 for TVLA corresponds to a false positive rate of approximately α ≈ 0.00001. For instance, if there are 1000 independent tests, then

α_{a l l} =

0.0068. On the other hand, if there are 1,000,000 independent tests, then

α_{a l l} =

0.9987 in [63]. Therefore, the products that generate long power traces are more likely to be deemed vulnerable compared to those with shorter traces. Consequently, the positive outcomes may be unreliable.

(3): Impossibility of achieving exhaustive coverage

Ideally, an evaluator would prefer to eliminate any possible sensitive dependency across all distributional forms, for all points and tuples of points as a whole, considering all target functions and intermediate states before determining the security of a target device. However, even when the best efforts are made, there are still limitations. Moreover, in order to enhance the scope of detection, extensive tests are required, which lead to an increase in the type I error rate, computational complexity, and sample complexity. Consequently, achieving exhaustive coverage through TVLA is not possible.

(4): The multivariate problems of TVLA

In TVLA, it is assumed that the sample points in a trace are independent of each other. Additionally, TVLA is a test for the univariate analysis. However, in reality, numerous examples contradict this assumption. This is especially evident in the case of protective algorithm implementations, where the leakage is mostly observed to be multivariable or horizontal [29,37,47]. Therefore, it is crucial to consider the correlation between multivariate variables in leakage assessment.

(5): The fewer trace groups and dependence of statistical moment

The simplicity and efficiency of TVLA depend on a reduced number of groups, the fixed VS random traces set, and the mean statistical moment [64]. However, in cases where the leakage does not manifest in the mean statistical moment [32] and there is a presence of mean differences among multiple groups, this introduces the risk of both false positives and false negatives [29,30,31,32,33].

(6): The drawbacks of distribution assumption

The assumption is that the power traces obey a normal distribution in TVLA, while in reality, many examples that contradict this assumption can be observed. Especially for the protective algorithm implementations, it is generally necessary to use the combining functions to preprocess the side-channel traces in TVLA. Additionally, the distribution of samples does not obey the normal distribution after preprocessing [59].

(7): The shortcomings of certifying vulnerability

The assessor of TVLA can only answer whether there is a side-channel leakage but cannot provide information regarding the specific location of the leakage or how to exploit it for key recovery. The results obtained from TVLA are insufficient for certifying vulnerability or deducing the relationship between the detected leakage and attack [22].

4.2. The Optimizations of TVLA

In order to address the shortcomings of TVLA mentioned above, researchers have proposed various optimization assessment methods. This section provides a summary of optimization methods in three aspects, the optimization of statistical tools, the optimization of assessment processes, and the optimization of decision strategies.

4.2.1. The Optimization of the Statistical Tool

The statistical tools play a crucial role in leakage assessments as they significantly impact the detection results. The researchers have attempted to enhance the detection efficiency and reliability of results in TVLA by utilizing alternative statistical tools instead of Welch’s t-test when calculating statistical moments. This section summarizes the optimization methods for statistical tools.

(1): The paired t-test

The motivation: In [26], Adam Ding found that the environmental noise can adversely affect the results of t-tests in actual assessments. In the worst-case scenario, a device with leaks could pass the test solely due to the environmental noise being strong enough to mask them. In order to mitigate the impact of environmental noise, Adam Ding proposed a side-channel leakage detection based on paired t-tests in [26], where Welch’s t-test was replaced with the paired t-tests to eliminate the influence of environmental noise on the results.

The method: In the stage of collecting power traces, a fixed input sequence is proposed to effectively minimize environmental noise. This sequence consists of the repetition of ABBA, such as ABBAABBA… ABBAABBA. The power traces with minimal environmental variation are obtained through a paired t-test.

In the stage of categorizing power traces, the power traces are divided into two sets and noted as

L_{A} = {\{l_{A, 1}, \dots l_{A, n_{A}}\}}^{’}

and

L_{B} = {\{l_{B, 1}, \dots l_{B, n_{B}}\}}^{’}

, where

l_{A, i}

represents the

i

-th trace of

L_{A}

.

In the stage of calculating the statistical moment, assuming

n_{A} = n_{B} = n

, then there are

n

pairs of traces (

l_{A, i}, l_{B, i}

). Let

D = (L_{A} - L_{B}) = {\{D_{1}, \dots, D_{n}\}}^{’}

, where

\bar{D}

and

s_{D}^{2}

represent the sample mean and sample variance of

D

, respectively. Considering the null hypothesis

H_{0} : μ_{D} = 0

, then the statistical moment

|T_{p}|

of the paired t-test is as follows:

|T_{p}| = \frac{\bar{D}}{\sqrt{\frac{s_{D}^{2}}{n}}} .

(15)

In the stage of determining leakage, use the same threshold and decision strategy as TVLA.

(2): $χ^{2}$ -test

The motivation: In TVLA, comparing two groups and using the simple mean statistical moment can increase the risk of false negatives [28,49] or fail to detect leakages [40], when the leakage does not occur at the mean statistical moment. In 2018, Moradi proposed the side-channel leakage detection based on the

χ^{2}

-test [28]. In the

χ^{2}

-test, Welch’s t-test was replaced with the

χ^{2}

-test to detect whether multiple trace groups originate from the same population. Furthermore, in the

χ^{2}

-test, the frequency of side-channel measurements is stored in histograms, by analyzing the histograms to identify any distribution differences among the trace groups.

The method: In the stages of collecting power traces and determining leakage, the same method as TVLA is adopted. During the stage of categorizing power traces, the traces are divided into

r

type groups and each type contains

c

sets in the

χ^{2}

-test. The frequency of side-channel measurements is stored in

c

, and a contingency table is formed. In this table, the

F_{i, j}

represents the frequency of the

i

-th type group and the

j

-th set. The number of samples is denoted as

N = \sum_{i = 0}^{r - 1} \sum_{j = 0}^{c - 1} F_{i, j}

, and

E_{i, j} = \frac{\sum_{i = 0}^{r - 1} \sum_{j = 0}^{c - 1} F_{i, j}}{N}

. In the stage of calculating the statistical moment, the null hypothesis

H_{0}

of the

χ^{2}

-test states that all power traces come from the same population. The statistical moment

T_{χ^{2}}

and the degree of freedom

v

are obtained by (16).

T_{χ^{2}} = \sum_{i = 0}^{r - 1} \sum_{j = 0}^{c - 1} \frac{{(F_{i, j} - E_{i, j})}^{2}}{E_{i, j}}, v = (r - 1) \cdot (c - 1),

(16)

The probability

P

of accepting

H_{0}

is obtained by (17).

f (t, v) = \frac{t^{\frac{v}{2} - 1} \cdot e^{\frac{- t}{2}}}{e^{\frac{v}{2}} \cdot Γ (\frac{v}{2})}, P = 2 \int_{|t|}^{\infty} f (t, v) d t,

(17)

where

Γ (\cdot)

is the gamma function.

In the stage of determining leakage, use the same threshold and decision strategy as TVLA.

(3): KS test

The motivation: When the leakage does not occur at the mean statistical moment, TVLA is not the optimal choice. Consequently, Zhou X proposed the side-channel leakage detection based on the Kolmogorov–Smirnov (KS) test in [30]. The KS test is a non-parametric statistical test used to determines if two groups of traces originate from the same population by quantifying the distance of cumulative distribution functions between the two groups.

The method: In the stages of collecting power traces, categorizing power traces, and determining leakage, adopt the same method as TVLA. In the stage of calculating the statistical moment, the null hypothesis

H_{0}

of the KS test assumes that

L_{A}

and

L_{B}

come from the same population. The alternative hypothesis

H_{1}

states that

L_{A}

and

L_{B}

come from different populations. The probability

P

of accepting

H_{0}

is as follows.

P = 2 \sum_{j = 1}^{\infty} {(- 1)}^{j - 1} e^{- 2 j^{2} Z^{2}},

(18)

where

Z = D_{n_{A}, n_{B}} (\sqrt{J} + \frac{0.11}{\sqrt{J}} + 0.12), J = \frac{n_{A} \cdot n_{B}}{n_{A} + n_{B}} .

D_{n_{A}, n_{B}}

represents the maximum distance of cumulative distribution probabilities between

L_{A}

and

L_{B}

,

D_{n_{A}, n_{B}} = \sup_{x} |L_{A, n_{A}} (x) - L_{B, n_{B}} (x)| .

(4): Hotelling $T^{2}$ -test

The motivation: TVLA is based on the assumptions of a sample and its detection efficiency strongly depends on parameters such as the signal-to-noise ratio (SNR), degree of dependency, and density. The correct interpretation of leakage detection results requires prior knowledge of these parameters. However, the evaluators often do not have this prior knowledge, which poses a non-trivial challenge. In order to address this issue, Bronchain. O proposed using the Hotelling

T^{2}

-test instead of Welch’s t-test in [36]. Additionally, they explored the concept of multivariate detection, which is able to exploit differences more effectively between multiple informative points in the trace compared to concurrent univariate t-tests.

The method: In the stages of collecting power traces and categorizing power traces, adopt the same method as TVLA. In the stage of calculating the statistical moment, the null hypothesis

H_{0}

of the Hotelling

T^{2}

-test is as follows.

H_{0} : μ_{A} = μ_{B}, H_{1} : μ_{A} \neq μ_{B}

(19)

then, the statistical moment

T^{2}

is

T^{2} = \frac{n_{A} n_{B}}{n_{A} + n_{B}} {(\bar{x_{A}} - \bar{x_{B}})}^{T} S^{- 1} (\bar{x_{A}} - \bar{x_{B}}),

(20)

where

n_{A}, n_{B}

are the cardinality of trace sets

L_{A}

and

L_{B}

, the length of traces is

n_{l}

, and the covariance matrix

C

is

C = \frac{\sum_{i = 1}^{n_{A}} (x_{i} - \bar{x_{A}}) {(x_{i} - \bar{x_{A}})}^{T} + \sum_{j = 1}^{n_{B}} (x_{j} - \bar{x_{B}}) {(x_{j} - \bar{x_{B}})}^{T}}{(n_{A} + n_{B} - 2)} .

(21)

The statistical moment

T^{2}

follows the Fisher distribution with degrees of freedom

(n_{l},

n_{A} + n_{B} - 2

).

\frac{(n_{A} + n_{B} - 1 - n_{l})}{(n_{A} + n_{B} - 2) n_{l}} T_{H_{0}}^{2} = λ T_{H_{0}}^{2} \sim F (n_{l}, n_{A} + n_{B} - 2),

(22)

λ = \frac{(n_{A} + n_{B} - 1 - n_{l})}{(n_{A} + n_{B} - 2) n_{l}},

(23)

then, the probability of accepting hypothesis

H_{0}

is

P = 1 - F_{F} (λ T^{2})

(24)

where

F_{F} (x, v_{1}, v_{2}) = I_{\frac{v_{1} x}{v_{1} x + v_{2}}} (\frac{v_{1}}{2}, \frac{v_{2}}{2})

,

I

is the regularization incomplete

β

function, and

v_{1}

and

v_{2}

are the degrees of freedom.

In the stage of determining leakage, use the same threshold and decision strategy as TVLA.

(5): ANOVA

The motivation: The TVLA and the

χ^{2}

-test require a large number of traces to distinguish the difference between leakage points and non-leakage points, and the paired t-test requires careful selection of inputs and low-noise measurements. Wei Yang proposed a novel leakage detection method using analysis of variance (ANOVA) in [31]. In ANOVA, the traces are categorized into multiple groups, and variance analysis is employed to identify the differences among these groups.

The method: The method of collecting power traces and determining leakage is adopted from TVLA. In the stage of categorizing power traces, the power trace set

L

is divided into

r

groups. Let

N,

n_{i}

be the cardinality of

L

,

L_{i}

, while

\bar{x}

represents the sample mean of

L

and

\bar{x_{i}}

represents the sample mean of

L_{i}

. In the stage of calculating the statistical moment, the null hypothesis

H_{0}

of the ANOVA test assumes that all group traces come from the same population, then the statistical moment

T_{F}

of ANOVA is calculated as follows.

T_{F} = \frac{(N - r) S S b}{(r - 1) S S w}, S S b = \sum_{i = 1}^{r} {(\bar{x_{i}} - \sum_{i = 1}^{r} \frac{n_{i} \bar{x_{i}}}{N})}^{2}, S S w = \sum_{i = 1}^{r} \sum_{j = 1}^{n_{i}} {(x_{i, j} - \bar{x})}^{2},

(25)

t h = F_{1 - α} (r - 1, N - r),

(26)

The probability

P

of accepting hypothesis

H_{0}

is presented in Equation (27).

P = \int_{|T_{F}|}^{\infty} f (x, v) d x, f (v_{1}, v_{2}, x) = \frac{Γ (\frac{v_{1} + v_{2}}{2}) {(\frac{v_{1}}{v_{2}})}^{\frac{v_{1}}{2}} x^{\frac{v_{1}}{2} - 1}}{Γ (\frac{v_{1}}{2}) \cdot Γ (\frac{v_{2}}{2}) {(1 + \frac{x v_{1}}{v_{2}})}^{\frac{v_{1} + v_{2}}{2}}},

(27)

where

v_{1} = r - 1

and

v_{2} = N - r

.

(6): The deep learning leakage assessment

The motivation: TVLA is conducted under the assumption that each sample point is independent and that the leakage occurs at the mean statistical moment. However, in reality, many examples contradict this hypothesis. Additionally, unaligned traces can also impact the detection results. Therefore, TVLA is inadequate for addressing horizontal and multivariate leakage, as well as unaligned traces. In order to solve this issue, Moos T proposed the method of Deep Learning Leakage Assessment (DL-LA) [37].

The method: DL-LA maintains the basic idea of TVLA, which involves discriminating between two groups of traces, and enhances the side-channel leakage assessment by training a neural network as a distinguisher to discriminate the two group traces. In the stage of collecting power traces, the same method of trace collection as TVLA is implemented. In the stage of categorizing power traces, the trace set

L

is divided into the training set and validation set, which do not intersect. The mean

μ

and standard deviation

δ

of the training set are calculated and

X_{i}^{j} = \frac{(X_{i}^{j} - μ_{i})}{δ_{i}}

is used to standardize both the training set and validation set, where

j

represents the trace and

i

represents the sample points of trace. In the stage of calculating the statistical moment, the assessor begins by training the neural network distinguisher, using the training set, and then validating its accuracy using the validation set. If the accuracy of the neural network distinguisher exceeds that of a random guess distinguisher, it can be employed that the neural network distinguisher can be used to discriminate the two groups of traces. The construction of the neural network distinguisher is as follows.

The network is built using Python’s Keras library, with Tensor Flow serving as the backend. It comprises four fully connected layers, consisting of neurons with outputs of 120, 90, 50, and 2, respectively. The ReLU function is utilized as the activation function for the input layer and each inner layer, while softmax functions as the activation function for the final layer. The four layers are separated by a Batch Normalization layer. Once the neural network distinguisher is constructed, the assessor proceeds to employ it to conduct the leakage assessment. The null hypothesis

H_{0}

states that the traces can be randomly divided into two groups by the neural network distinguisher, and the total number of correct classifications follows the binomial distribution

X ~ B i n o m (X, 0.5)

. The probability that the total number of correct classifications is at least

S_{X}

in the pure random distinguisher is

P (X \geq S_{X})

, and the probability is calculated as follows.

P (X \geq S_{X}) = \sum_{q = S_{X}}^{X} (\begin{matrix} X \\ q \end{matrix}) {0.5}^{q} {0.5}^{X - q} = {0.5}^{X} \sum_{q = S_{X}}^{X} (\begin{matrix} X \\ q \end{matrix}) .

(28)

In the stage of determining leakage, the threshold

P_{t h} = 10^{- 5}

is set, and if

P (X \geq S_{X}) > P_{t h}

, then the hypothesis

H_{0}

is rejected, indicating the presence of side-channel leakage.

In summary, the various optimization methods (in Table 1) mentioned above are all aimed at optimizing the statistical tool of TVLA. However, it should be noted that each optimization method only addresses certain shortcomings of TVLA. Consequently, there is currently no optimal statistical tool available that can effectively solve all shortcomings associated with the t-test used in TVLA. Therefore, when conducting leakage detection, it is essential to select a statistical tool that is appropriate for the specific environment. If addressing environmental noise, the paired t-test is recommended in place of the t-test. For horizontal and multivariate leakage, the Hotelling

T^{2}

-test or DL-LA are suggested. Furthermore, for multi-group traces or non-mean statistical moment leakage, the

χ^{2}

-test, ANOVA, and KS test are recommended alternatives to the t-test. Hence, it is highly recommended to select the appropriate statistical tool based on the characteristics of the detection environment and the nature of the leakage. This ensures accurate and reliable results.

4.2.2. The Optimization of the Leakage Assessment Process

In TVLA, the efficiency and accuracy of detection results depend on the leakage assessment process. With this in mind, the researchers have proposed a series of suggestions to accelerate the detection process [32,33] or proposed a novel leakage assessment process [34]. This section primarily scrutinizes the optimization methods of the assessment process.

(1): The optimization of TVLA’s detection process

Melissa Azouaoui studied the literature on leakage assessment and considered the leakage assessment process of TVLA as the combination or iteration of three steps: measurement and preprocessing, leakage detection and mapping, and leakage exploitation, and tried to find whether there are optimal guarantees for these steps in [33].

For measurement and preprocessing, the setting up of measurement devices depends on the expertise [65]. The preprocessing is also similar, and currently the main methods include filtering the noise [64,66] and aligning the power traces [67,68]. The best methods of setting up the measurement devices and preprocessing should be as open and repeatable as possible. Although there are some methods for setting up the measurement devices in FIPS 140 and ISO, there is currently no guaranteed optimal approach for measurement and preprocessing.

For leakage detection and mapping, the statistical hypothesis is commonly employed for comparing the distribution or statistical moment. Despite the existence of numerous methods for leakage detection, consensus on their fairness and optimality has yet to be reached. Moreover, the “budget” of traces plays a significant role in leakage detection, yet there is presently no established threshold for the optimal number of “budget” traces [29].

For leakage exploitation, it is typically divided into three stages: modeling, information extraction, and information processing. However, during the modeling stage, the evaluator utilizes the traces to estimate the optimal model based on the implementation. Nevertheless, as the number of shares increases in the mask scheme, the cost increases and the independence of samples affects the modeling phase. Currently, obtaining the optimal model remains an unresolved issue, and there is no optimal method.

During the actual leakage assessment, there are risks associated with all the aforementioned steps, and currently, there is no guarantee for an optimal leakage assessment process.

(2): A novel framework for explainable leakage assessment

Due to the unavailability of leakage detection results in TVLA, the current approach consists of utilizing a specific attack to verify the detection outcomes. Based on this, Gao Si and Elisabeth Oswald introduced a novel leakage assessment process in [34], referred to as the “the process of Gao Si” in this paper. The leakage assessment process of [34] is outlined below.

Step 1: Non-specific detection via key-dependent models.

Consider the two nested key-dependent models: The full model

L_{c f}

fits a model as a function of the key

K

to the observed data. The null model

L_{0}

now contains only a constant term, which represents the case where there is no dependency on

K

.

L_{c f} (K_{c}) = \sum_{j} β_{j} μ_{j} (K_{c}), j \in  [0, 2^{16}),

(29)

L_{0} (K) = β_{0},

(30)

where the coefficients

β_{j}

are estimated from the traces via least square estimation.

The F-test is used to test

H_{0}

(both

L_{c f}

and

L_{0}

models explain the observed data equally well) versus

H_{1}

(

L_{c f}

explains the data better than

L_{0}

). If the F-test finds enough evidence to reject

H_{0}

, we conclude this point’s leakage relies on

K_{c}

, because

K_{c}

is a part of

K

. Consequently, the measurement depend on

K

.

Step 2: Degree analysis

Further restricting the degree of the key-dependent model, we determine how large the key guess is required to be to exploit an identified leakage

.

We obtain the model

L_{c r} (K_{c}) = \sum_{j} β_{j} μ_{j} (K_{c}), j \in  [0, 2^{16}), \deg (μ_{j} (K_{c}) \leq g)

(31)

The F-test is used again to test

H_{0}

(

L_{c r}

and

L_{c f}

explain the data equally well) versus

H_{1}

(

L_{c f}

explains the data better than

L_{c r}

).

The goal of the

F

-test is to have

L_{f} (X) = \sum_{j} β_{j} μ_{j} (X), j \in T_{f}

with

L_{r} (X) = \sum_{j} β_{j} μ_{j} (X), j \in J_{r} \in T_{f}

. The statistical moment of the

F

-test is as follows.

|F| = \frac{\frac{R S S_{r} - R S S_{f}}{n_{f} - n_{r}}}{\frac{R S S_{f}}{N - n_{f}}} .

(32)

where

R S S = \sum_{i = 1}^{N} {(y^{(i)} - \tilde{L} (x^{(i)}))}^{2}

,

y^{(i)}

represents the measurements of

x^{(i)}

, and

\tilde{L} (x^{(i)})

is the

L_{c f}

or

L_{r}

of

x^{(i)}

.

n_{f}

and

n_{r}

are the sample size of

L_{f}

and

L_{r}

.

N

is the sample size of

L

. The statistical moment

|F|

of the

F

-test obeys the

F

distribution with the freedom degree of (

n_{f} - n_{r}, N - n_{f})

. The threshold of the

F

-test is

F_{t h} = Q_{F} (d f_{1}, d f_{2}, 1 - α)

, where

d f_{1} = n_{f} - 1

,

d f_{2} = N - n_{f}

, and

Q_{F}

is the quantile function of the central

F

distribution.

If

F \geq F_{t h}

, the null hypothesis

H_{0}

is rejected,

L_{f} (X)

explains the data better than

L_{r} (X)

, and the leakage contains the information of the key. If there is enough evidence to reject

H_{0}

, we obtain that a model with only

g

or fewer key bytes suffices to explain the measurements. By successively reducing

g

, we can therefore determine the maximum key guess that is required to explain the side channel measurements.

Step 3: Subkey identification

By using the technique of further restricting the reduced model, we can narrow down precisely which specific key bytes are required to explain the identified leakage.

Step 4: Converting to specific attacks

If an evaluation regime requires an evaluator to demonstrate an actual attack targeting an identified leakage point, a relatively straightforward connection to a concrete profiled attack can be established.

In summary, although optimization methods exist for the measurement process and preprocessing process, there is currently no optimal leakage detection process. In actual leakage detection, the detection process of TVLA is still used to detect leakages. The process of Gao Si is a new detection process to demonstrate that the discovered leakages are key-related and can be exploited by attacks. The approach is a small step towards establishing precise attack vectors for confirmatory attacks.

4.2.3. The Optimization of TVLA’s Decision Strategy

(1): The decision strategy of HC

For long traces, the detection result is obtained through using the

m i n - P

strategy in TVLA. This strategy relies solely on the minimum

|p|

value to make a decision about leakage, disregarding all other

p

-values. Ding, A.A proposed the higher critical (HC) strategy in [35], which takes into account the information from all

p

-values.

The null hypothesis

H_{0}

: there is no leakage point in the trace. The alternative hypothesis

H_{1}

: there is at least one leakage point in the trace. Let the length of traces be

n_{l}

, there are

n_{l}

p

-values, denoted as

p (1), \dots, p (n_{l})

, and the HC strategy is as follows.

Step 1: Sorting

p

-values in ascending order:

p (1) \leq p (2) \leq \dots \leq p (n_{l})

.

Step 2: Calculating the normalized distance

\hat{H C_{n_{l}, i}}

of the

p

-value, and the normalized distance of the HC strategy is the Formula (33).

\hat{H C_{n_{L}, i}} = \frac{\sqrt{n_{l}} (i / n_{l} - p (i))}{\sqrt{p (i) (1 - p (i))}} .

(33)

Step 3: Calculating the statistical moment of the HC strategy with (34).

\hat{H C_{n_{l}, m a x}} = m a x_{1 \leq i \leq \frac{n_{l}}{2}} \hat{H C_{n_{l}, i}}

(34)

Step 4: Comparing the statistical moment

\hat{H C_{n_{l}, m a x}}

with the threshold

t h_{n_{l}, α}^{H C}

of significance level

α

. If

\hat{H C_{n_{l}, m a x}} > t h_{n_{l}, α}^{H C}

, then the null hypothesis is rejected, indicating the presence of side-channel leakages. The threshold

t h_{n_{l}, α}^{H C}

represents the

1 - α

quantile of the statistical moment

\hat{H C_{n_{l}, m a x}}

under the null hypothesis. For large

n_{l}

, the threshold

t h_{n_{l}, α}^{H C}

can be approximated using the connection to a Brownian bridge, such as the calculation formula provided in Li and Siegmund [68].

In summary, the HC strategy can combine multiple leakage points to enhance the efficiency of leakage detection. Thus, it can serve as a viable alternative to TVLA’s min-P strategy.

4.3. The Summary of TVLA’s Optimization Schemes

Researchers have proposed various optimization schemes for TVLA, which primarily aim to address its inherent limitations. Currently, there is no comprehensive and universally applicable statistical tool and detection process that can effectively address all the identified limitations. Suitable detection methods for different detection purposes and conditions are summarized in Figure 4.

Therefore, assessors should perform the following process in actual leakage detection.

Firstly, select the leakage detection process based on the purpose of detection. If the aim is to only discover a side-channel leakage, the recommended process is TVLA. If the aim is to detect and utilize the leakages, it is recommended to choose the process of Gao Si.

Secondly, if the TVLA process is chosen, it is recommended to select an appropriate statistical tool based on the evaluator’s prior knowledge of the device. If the evaluator has no prior knowledge about the device, a t-test is used initially to detect whether there is a univariate, first-order leakage. If a leakage is detected, the leakage detection process is stopped. Otherwise, the

χ^{2}

-test, KS test and ANOVA test are used to detect univariate, high-order leakages, while the Hotelling

T^{2}

-test and DA-LA are used to test for the presence of multivariate or horizontal leakage. If the evaluator has prior knowledge of the device, the leakage type (univariate or multivariate) is determined based on this knowledge. Then, the detection environment (high noise or low noise) and alignment of power traces are determined. For univariate, first-order leakage (mean statistical moment) and low-noise environments, the t-test is more efficient. For univariate, first-order leakage and high-noise environments, the paired t-tests have better detection efficiency. For univariates where the leakage does not occur at the mean statistical moment, the

χ^{2}

-test, KS test, and ANOVA test have better detection performance than the t-test. For multivariate and horizontal leakages, it is recommended to choose the Hotelling

T^{2}

-test and DA-LA, with DA-LA being more effective when the traces are unaligned.

Finally, in the determination stage, the HC strategy is recommended.

Although many leakage detection methods have been proposed, TVLA remains the mainstream detection method. Therefore, the t-test is generally regarded as the mainstream detection tool, with other detection tools considered supplementary. However, the current detection methods cannot definitively conclude that there is no leakage, only provide evidence that no leakage has been detected.

5. Quantification of Side Channel Vulnerability

The inability of TVLA’s detection results to quantify the vulnerability of side channels raises the question of how to establish a relationship between attacking-style assessment and leakage detection-style assessment. Debapriya made an attempt to derive this relationship between TVLA and SCA. Additionally, SR can be calculated directly from TVLA, effectively connecting CC and FIPS 140-3 based on the provided intermediate variables and leakage model in [22].

The derivation process of the relationship between TVLA and SR is as follows. Let

L = f (X, k)

be the normalized leakage model, where

E (L) = 0, V a r (L) = E (L^{2}) = 1

. 𝑌 represents the measurements, and it can be defined as

Y = ϵ L + N,

where

ϵ

is the scale factor and

N ~ N (0, σ^{2})

represents the noise. Taking S-box as an example, the

n

-bit Hamming weight model can be expressed as

f (X, k) = \frac{2}{\sqrt{n}} (H W (s b o x (X \otimes k)) - \frac{n}{2}) (*)

.

Firstly, link TVLA and NICV. We obtained

N I C V_{2} = \frac{1}{\frac{n}{{(T V L A)}^{2}} + \frac{n}{C} (\frac{1}{n_{2}} - \frac{1}{n_{1}}) (σ_{1}^{2} - σ_{2}^{2}) + 1}

. If

n_{2} = n_{1} = \frac{n}{2}

,

N I C V_{2} = \frac{1}{\frac{n}{{(T V L A)}^{2}} + 1}

. If the side-channel traces are divided into

q

groups, then

N I C V_{q} = \frac{q - 1}{q} \sum_{i = 1}^{q} N I C V_{2}^{i}

(35)

where

N I C V_{2}^{i} = \frac{\sum_{i = 1}^{q} \frac{n_{i}}{\bar{n_{i}} \cdot n} {(μ_{i} - μ)}^{2}}{\frac{1}{n} \sum_{j = 1}^{n} (Y_{j} - μ)}, \bar{n_{i}} = n - n_{i}

.

Secondly, link SNR and NICV. The form of SNR, NICV, and TVLA under the leakage model

(*)

are

S N R = \frac{V a r (E (Y | X))}{E (V a r (Y | X))},

S N R = \frac{ϵ^{2}}{σ^{2}}, N I C V = \frac{V a r (E (Y | X))}{V a r (Y)}, and N I C V = \frac{1}{1 + \frac{δ^{2}}{ϵ^{2}}}

. Then

N I C V = \frac{1}{\frac{1}{S N R} + 1} = \frac{ϵ \cdot l (X, q)}{\sqrt{ϵ^{2} + 2 σ^{2}}}

(36)

Finally, export SR using SNR. The relationship between SNR and SR is

S R = Φ_{ [K + (\frac{1}{4}) S N R (K^{* *} - 𝓀 𝓀^{T})]} (\sqrt{Q} \frac{1}{2} \sqrt{S N R} 𝓀),

(37)

where

𝓀 = {(𝓀 (k_{c}, k_{g_{1}}), \dots, 𝓀 (k_{c}, k_{g_{2^{n} - 1}}))}^{T}, 𝓀 (k_{c}, k_{g}) = E ({(l (X, k_{c}) - l (X, k_{g_{i}}))}^{2}) .

K = (\begin{matrix} 𝓀 (k_{c}, k_{g_{1}}, k_{g_{1}}) & \dots & 𝓀 (k_{c}, k_{g_{1}}, k_{g_{2^{n} - 1}}) \\ ⋮ & \dots & ⋮ \\ 𝓀 (k_{c}, k_{g_{2^{n} - 1}}, k_{g_{1}}) & \dots & 𝓀 (k_{c}, k_{g_{2^{n} - 1}}, k_{g_{2^{n} - 1}}) \end{matrix}),

𝓀 (k_{c}, k_{g_{i}}, k_{g_{j}}) = E ((l (X, k_{c}) - l (X, k_{g_{i}})) (l (X, k_{c}) - l (X, k_{g_{j}}))),

K^{* *} = (\begin{matrix} 𝓀^{* *} (k_{c}, k_{g_{1}}, k_{g_{1}}) & \dots & 𝓀^{* *} (k_{c}, k_{g_{1}}, k_{g_{2^{n} - 1}}) \\ ⋮ & \dots & ⋮ \\ 𝓀^{* *} (k_{c}, k_{g_{2^{n} - 1}}, k_{g_{1}}) & \dots & 𝓀^{* *} (k_{c}, k_{g_{2^{n} - 1}}, k_{g_{2^{n} - 1}}) \end{matrix})

𝓀^{* *} (k_{c}, k_{g_{i}}, k_{g_{j}}) = 4 E ({(l (X, k_{c}) - E (l (X, k_{c})))}^{2} 𝓀 ((l (X, k_{c}) - l (X, k_{g_{i}})) (l (X, k_{c}) - l (X, k_{g_{j}}))),

where

Φ_{ [S]} (μ)

is the cumulative distributive function of the multivariate normal distribution with mean vector

μ

and covariance

C

,

k_{c}

is the correct key, and

k_{g_{i}}

with

1 \leq i \leq 2 n - 1

.

Figure 5 presents the process of hybrid side-channel testing. The process consists of non-specific TVLA → specific TVLA → SR → evaluation results. First, the evaluator performs a non-specific TVLA on the target device, followed by the calculation of SR using specific TVLA. The side-channel vulnerabilities of the target device are assessed without the need for actual attacks. If SR is below the security limit, the device is considered safe; otherwise, it is considered vulnerable to SCA.

Although this method aims to establish the relationship between TVLA and SR, it provides limited bridging between CC and FIPS. However, this method is based on the assumptions regarding intermediate variables and leakage models, resulting in difficulties in practical detection for evaluators without professional knowledge of these models and values.

6. Discussion

Based on the research on leakage assessment works, we discuss leakage assessment from two perspectives: the current status and future development trend.

Regarding the current status of leakage assessment, firstly, although many leakage assessment methods have been proposed to address the specific shortcomings of TVLA (multivariate, high-order leakage, noise issues, or independence of traces), there is currently no unified standardized method to guide all assessors in conducting the assessment step by step. Secondly, in actual leakage detection, the evaluator or agencies often spend a significant amount of time collecting the traces, regardless of the detection method used. The goal is to evaluate the security of products and identify potential vulnerabilities and provide further design to achieve the required security level. However, current leakage detection focuses solely on discovering leakages and does not uncover vulnerabilities or guide the design process. From an investment cost perspective, the income obtained from leakage detection is low.

In terms of the future development trend of leakage assessment, researchers are currently attempting to link the non-specific leakage detection results with key guessing in order to address the issue of unusable detection results. This approach would allow the detection results to reveal information about the key or causes of leakage, which can aid designers in constructing attacks or defenses. Therefore, we anticipate the formation of a unified and fair method for security assessment. Additionally, in recent years, deep learning technology has been applied to leakage detection. Compared to the traditional leakage assessment technology, the deep learning leakage assessment offers simplicity and high level of statistical confidence. However, it lacks the ability to provide superior security metrics. Therefore, the further exploration is needed on how to effectively apply the deep learning technology to side-channel leakage assessment.

7. Conclusions

In this paper, we conducted a comprehensive study on the leakage detection-style assessment. These leakage detection-style assessment methodologies can be classified into two categories: TVLA and its optimizations. We identified the drawbacks of TVLA and categorized the optimization schemes aimed at addressing these limitations into three groups: statistical tool optimization, detection process optimization, and decision strategy optimization. We gave succinct descriptions of the leakage detection assessment motivations and detection processes and compared the efficiency of different optimization schemes. Based on our classification and summary, we concluded that there is no single optimal scheme that can effectively address all the shortcomings of TVLA. Different optimization schemes are proposed for specific purposes and detection conditions. We summarized the purposes and conditions for all TVLA optimizations and proposed a selection strategy for leakage detection-style assessment schemes. According to the selection strategy, the leakage detection process should be chosen based on the specific detection purpose. For discovering side-channel leakages, TVLA is recommended, while the detection process of Gao Si is suitable for discovering and utilizing leakages. Additionally, the appropriate statistical tool should be selected. T-tests and paired t-tests are for detecting univariate, first-order leakage. In low-noise environments, the t-test is more suitable, whereas in high-noise environments, the paired t-test demonstrates better detection efficiency. If the leakage does not occur at the mean statistical moment, the

χ^{2}

-test, KS test and ANOVA test outperform the t-test for detecting univariate, high-order leakages. For testing multivariate or horizontal leakages, the Hotelling

T^{2}

-test and DA-LA are recommended. If the traces are unaligned, DA-LA is more effective. Lastly, the HC strategy is recommended for the determination stage. Based on the current status of leakage detection-style assessment, we discuss the development trend of leakage detection. Researchers are increasingly interested in linking leakage detection with key guessing to address the issue of unusable detection results. The aim is to establish a unified and fair method for security assessment.

Author Contributions

Y.W. and M.T. made substantial contributions to the conceptualization and methodology of the review. Y.W. also contributed to the writing, review, and editing of the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Key R&D Program of China No. 2022YF B3103800 and the National Natural Science Foundation of China No. 61972295.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

Li, Y.; Shen, C.; Tian, N. Guiding the Security Protection of Key Information Infrastructure with a Scientific Network Security Concept. J. Internet Things 2019, 3, 1–4. [Google Scholar]
Cao, S.; Fan, L. NSA’s top backdoor has been exposed by Chinese researchers. Glob. Times 2022. [Google Scholar] [CrossRef]
Biham, E.; Shamir, A. Differential cryptanalysis of DES-like cryptosystems. J. Cryptol. 1991, 4, 3–72. [Google Scholar]
Matsui, M. Linear Cryptanalysis Method for DES Cipher. In Proceedings of the Workshop on the Theory and Application of Cryptographic Techniques, Lofthus, Norway, 23–27 May 1993; Springer: Berlin/Heidelberg, Germany, 1993; pp. 386–397. [Google Scholar]
Knudsen, L.R. Cryptanalysis of LOKI 91, Advances in Cryptology-Auscrypt 92, LNCS 718. In Proceedings of the Workshop on the Theory and Application of Cryptographic Techniques, Gold Coast, Queensland, Australia, 13–16 December 1992; Springer-Verlag: Berlin/Heidelberg, Germany, 1998; pp. 196–208. [Google Scholar]
Kocher, P.; Jaffe, J.; Jun, B. Differential Power Analysis. In Proceedings of the 19th Annual International Cryptology Conference, Santa Barbara, CA, USA, 15–19 August 1999; Springer: Berlin/Heidelberg, Germany, 1999; pp. 388–397. [Google Scholar]
Mangard, S. A Simple Power Analysis (SPA) Attack on Implementations of the AES Key Expansion. In Proceedings of the International Conference on Information Security and Cryptology, Seoul, Republic of Korea, 28–29 November 2002; Springer: Berlin/Heidelberg, Germany, 2002; pp. 343–358. [Google Scholar]
Brier, E.; Clavier, C.; Olivier, F. Correlation Power Analysis with a Leakage Model. In Proceedings of the 6th International Workshop on Cryptographic Hardware and Embedded Systems, Cambridge, MA, USA, 11–13 August 2004; Springer: Berlin/Heidelberg, Germany, 2004; pp. 16–29. [Google Scholar]
Distinguisher, A.G.S.C.; Gierlichs, B.; Batina, L.; Tuyls, P.; Preneel, B. Mutual Information Analysis. In Proceedings of the International Workshop on Cryptographic Hardware and Embedded Systems, Washington, DC, USA, 10–13 August 2008; Springer: Berlin/Heidelberg, Germany, 2008; pp. 426–442. [Google Scholar]
Maghrebi, H.; Portigliatti, T.; Prouff, E. Breaking Cryptographic Implementations Using Deep Learning Techniques. In Proceedings of the International Conference on Security, Privacy and Applied Cryptography Engineering, Hyderabad, India, 14–18 December 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 3–26. [Google Scholar]
Cagli, E.; Dumas, C.; Prouff, E. Convolutional Neural Networks with Data Augmentation against Jitter-Based Countermeasure. In Proceedings of the International Conference on Cryptographic Hardware and Embedded Systems, Taipei, Taiwan, 25–28 September 2017; Springer: Berlin/Heidelberg, Germany, 2017; pp. 45–68. [Google Scholar]
Benadjila, R.; Prouff, E.; Strullu, R.; Cagli, E.; Dumas, C. Deep learning for side-channel analysis and introduction to ASCAD database. J. Cryptogr. Eng. 2020, 10, 163–188. [Google Scholar] [CrossRef]
Picek, S.; Samiotis, I.P.; Heuser, A.; Kim, J.; Bhasin, S.; Legay, A. On the Performance of Deep Learning for Side-Channel Analysis. In Proceedings of the IACR Transactions on Cryptographic Hardware and Embedded Systems, Amsterdam, The Netherland, 9–12 September 2018; pp. 281–301. [Google Scholar]
Himanshu, T.; Hanmandlu, M.; Kumar, K.; Medicherla, P.; Pandey, R. Improving CEMA Using Correlation Optimization. In Proceedings of the 2020 International Conference on Advances in Computing, Communication Control and Networking (ICACCCN), Greater Noida, India, 18–19 December 2020; pp. 211–216. [Google Scholar]
Agrawal, D.; Archambeault, B.; Rao, J.R.; Rohatgi, P. The EM Side Channel. In Proceedings of the 4th International Workshop on cryptographic Hardware and Embedded Systems, Redwood Shores, CA, USA, 13–15 August 2002; Springer: Berlin/Heidelberg, Germany, 2002; pp. 29–45. [Google Scholar]
Kocher, P.C. Timing Attacks on Implementations of Diffie-Hellman, RSA, DSS, and Other Systems. In Proceedings of the 16th Annual International Cryptology Conference, Santa Barbara, CA, USA, 18–22 August 1996; Springer: Berlin/Heidelberg, Germany, 1996; pp. 104–113. [Google Scholar]
Boneh, D.; DeMillo, R.A.; Lipton, R.J. On the Importance of Checking Cryptographic Protocols for Faults. In Proceedings of the Advances in Cryptology-EUROCRYPT’97, LNCS 1233, International Conference on the Theory and Application of Cryptographic Techniques, Konstanz, Germany, 11–15 May 1997; Spring: Berlin/Heidelberg, Germany, 1997; pp. 37–51. [Google Scholar]
Bernstein, D.J. Cache-Timing Attacks on AES. 2004. Available online: https://mimoza.marmara.edu.tr/~msakalli/cse466_09/cache%20timing-20050414.pdf (accessed on 14 August 2023).
ISO/IEC JTC 1/SC 27: ISO/IEC 17825; Information Technology—Security Techniques—Testing Methods for the Mitigation of Non-Invasive Attack Classes against Cryptographic Modules. International Organization for Standardization: Geneva, Switzerland, 2016.
FIPS 140–3; Security Requirements for Cryptographic Modules. NIST: Gaithersburg, MD, USA, 2019.
Roy, D.B.; Bhasin, S.; Guilley, S.; Heuser, A.; Patranabis, S.; Mukhopadhyay, D. CC meets FIPS: A Hybrid Test Methodology for First Order Side Channel Analysis. IEEE Trans. Comput. 2019, 68, 347–362. [Google Scholar] [CrossRef] [Green Version]
Schneider, T.; Moradi, A. Leakage Assessment Methodology. In Proceedings of the Cryptographic Hardware and Embedded Systems CHES 2015, Saint-Malo, France, 13–16 September 2015; Springer: Berlin/Heidelberg, Germany, 2015; pp. 495–513. [Google Scholar]
Standaert, F.X. How (Not) to Use Welch’s t-test in Side Channel Security Evaluations; Report 2016/046; Cryptology ePrint Archive; Springer: Berlin/Heidelberg, Germany, 2016. [Google Scholar]
Durvaux, F.; Standaert, F.-X. From Improved Leakage Detection to the Detection of Points of Interests in Leakage Traces. In Proceedings of the 35th Annual International Conference on the Theory and Applications of Cryptographic Techniques, Vienna, Austria, 8–12 May 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 240–262. [Google Scholar]
Ding, A.A.; Chen, C.; Eisenbarth, T. Simpler, Faster, and More Robust T-Test Based Leakage Detection. In Proceedings of the Cryptographic Hardware and Embedded Systems—CHES, Graz, Austria, 14–15 April 2016; Springer: Berlin/Heidelberg, Germany, 2014; pp. 108–125. [Google Scholar]
Mather, L.; Oswald, E.; Bandenburg, J.; Wójcik, M. Does My Device Leak Information? A Priori Statistical Power Analysis of Leakage Detection Tests. In Proceedings of the International Conference on the Theory and Application of Cryptology and Information Security, Bengaluru, India, 1–5 December 2013; Springer: Berlin/Heidelberg, Germany, 2013; pp. 486–505. [Google Scholar]
Moradi, A.; Richter, B.; Schneider, T.; Standaert, F.X. Leakage Detection with the χ²-Test. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2018, 2018, 209–237. [Google Scholar] [CrossRef]
Bronchain, O.; Schneider, T.; Standaert, F.X. Multi-tuple leakage detection and the dependent signal issue. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2019, 2019, 318–345. [Google Scholar] [CrossRef]
Zhou, X.; Qiao, K.; Ou, C. Leakage Detection with Kolmogorov-Smirnov Test. Cryptology ePrint Archive, Paper 2019/1478. Available online: https://eprint.iacr.org/2019/1478 (accessed on 14 August 2023).
Yang, W.; Jia, A. Side-channel leakage detection with one-way analysis of variance. Secur. Commun. Netw. 2021, 2021, 6614702. [Google Scholar] [CrossRef]
Azouaoui, M.; Bellizia, D.; Buhan, I.; Debande, N.; Duval, S.; Giraud, C.; Jaulmes, É.; Koeune, F.; Oswald, E.; Standaert, F.X.; et al. A Systematic Appraisal of Side Channel Evaluation Strategies? In Proceedings of the Security Standardisation Research: 2020 International Conference on Research in Security Standardisation, SSR 2020, London, UK, 30 November–1 December 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 46–66. [Google Scholar]
Bronchain, O. Worst-Case Side-Channel Security: From Evaluation of Countermeasures to New Designs. Ph.D. Thesis, Catholic University of Louvain, Louvain-la-Neuve, Belgium, 2022. [Google Scholar]
Gao, S.; Oswald, E. A Novel Completeness Test and its Application to Side Channel Attacks and Simulators. In Proceedings of the Annual International Conference on the Theory and Applications of Cryptographic Techniques, EUROCRYPT 2022: Advances in Cryptology—EUROCRYPT 2022, Trondheim, Norway, 30 May–3 June 2022; pp. 254–283. [Google Scholar]
Ding, A.A.; Zhang, L.; Durvaux, F.; Standaert, F.X.; Fei, Y. Towards Sound and Optimal Leakage Detection Procedure. In Proceedings of the Smart Card Research and Advanced Applications—16th International Conference, CARDIS 2017, Lugano, Switzerland, 13–15 November 2017; Revised Selected Papers, Volume 10728 of Lecture Notes in Computer Science. Springer: Berlin/Heidelberg, Germany, 2017; pp. 105–122. [Google Scholar]
Zhang, L.; Mu, D.; Hu, W.; Tai, Y. Machine-learning-based side-channel leakage detection in electronic system-level synthesis. IEEE Netw. 2020, 34, 44–49. [Google Scholar] [CrossRef]
Moos, T.; Wegener, F.; Moradi, A. DL-LA: Deep Learning Leakage Assessment: A modern roadmap for SCA evaluations. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2021, 2021, 552–598. [Google Scholar] [CrossRef]
Whitnall, C.; Oswald, E. A Critical Analysis of ISO 17825 Testing Methods for the Mitigation of Non-Invasive Attack Classes against Cryptographic Modules. In Proceedings of the International Conference on the Theory and Application of Cryptology and Information Security, Kobe, Japan, 8–12 December 2019; Springer: Cham, Switzerland, 2019; pp. 256–284. [Google Scholar]
Chari, S.; Raoj, R.; Rohatgi, P. Template Attacks. In Proceedings of the Lecture Notes in Computer Science: Volume 2523 Cryptographic Hardware and Embedded Systems-CHES 2002, 4th International Workshop, Redwood Shores, CA, USA, 13–15 August 2002; Revised Papers. Springer: Berlin/Heidelberg, Germany, 2002; pp. 13–28. [Google Scholar]
Rechberger, C.; Oswald, E. Practical Template Attacks. In Proceedings of the 5th International Workshop, WISA 2004, Jeju Island, Republic of Korea, 23–25 August 2004; Springer: Berlin/Heidelberg, Germany, 2004; pp. 440–456. [Google Scholar]
Choudary, O.; Kuhn, M.G. Effectient Template Attacks. In Proceedings of the Lecture Notes in Computer Science: Vo1ume 84l9 Smart Card Research and Advanced Applications 12th International Conference, CARDIS 20l3, Berlin, Germany, 27–29 November 2013; Springer: Berlin/Heidelberg, Germany, 2013; pp. 253–270. [Google Scholar]
Cagli, E.; Dumas, C.; Prouff, E. Convolutional Neural Networks with Data Augmentation against Attack Based Countermeasures-Profiling Attacks without Preprocessing. In Proceedings of the Lecture Notes in Computer Science: Volume l0529 Cryptographic Hardware and embedded Systems—CHES 2017 19th International Conference, Taipei, Taiwan, 25–28 September 2017; Springer: Berlin/Heidelberg, Germany, 2017; pp. 25–28. [Google Scholar]
Kim, J.; Picek, S.; Heuser, A.; Bhasin, S.; Hanjalic, A. Make some noise. unleashing the power of convolutional neural networks for profiled side-channel analysis. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2019, 2019, 148–179. [Google Scholar] [CrossRef]
Picek, S.; Heuser, A.; Jovic, A.; Bhasin, S.; Regazzoni, F. The curse of class imbalance and conflicting metrics with machine learning for side-channel evaluations. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2019, 2019, 209–237. [Google Scholar] [CrossRef]
Danger, J.L.; Duc, G.; Guilley, S.; Sauvage, L. Education and Open Benchmarking on Side-Channel Analysis with the DPA Contests. In Non-Invasive Attack Testing Workshop; NIST: Gaithersburg, MD, USA, 2011. [Google Scholar]
Standaert, F.X.; Gierlichs, B.; Verbauwhede, I. Partition vs. Comparison Side Channel Distinguishers: An Empirical Evaluation of Statistical Tests for Univariate Side-Channel Attacks against Two Unprotected CMOS Devices. In Proceedings of the International Conference on Information Security and Cryptology, ICISC 2008, Seoul, Republic of Korea, 3–5 December 2008; Springer: Berlin/Heidelberg, Germany, 2009; pp. 253–267. [Google Scholar]
Whitnall, C.; Oswald, E. A Cautionary Note Regarding the Usage of Leakage Detection Tests in Security Evaluation. In Proceedings of the International Conference on the Theory and Application of Cryptology and Information Security: Advances in Cryptology—ASIACRYPT 2013, Bengaluru, India, 1–5 December 2013; pp. 486–505. [Google Scholar]
Coron, J.S.; Kocher, E.; Naccache, D. Statistics and Secret Leakage. In Proceedings of the Financial Cryptography: 4th International Conference, FC 2000, Anguilla, British West Indies, 20–24 February 2000; Springer: Berlin/Heidelberg, Germany, 2001; pp. 157–173. [Google Scholar]
Standaert, F.X.; Malkin, T.G.; Yung, M. A Unified Framework for the Analysis of Side-Channel Key Recovery Attacks. In Proceedings of the Annual International Conference on the Theory and Applications of Cryptographic Techniques, Cologne, Germany, 26–30 April 2009; Springer: Berlin/Heidelberg, Germany, 2009; pp. 443–461. [Google Scholar]
Chatzikokolakis, K.; Chothia, T.; Guha, A. Statistical Measurement of Information Leakage. In Proceedings of the International Conference on Tools and Algorithms for the Construction and Analysis of Systems, ETAPS 2010, Paphos, Cyprus, 20–29 March 2010; Springer: Berlin/Heidelberg, Germany, 2010; pp. 390–404. [Google Scholar]
Chothia, T.; Guha, A. A Statistical Test for Information Leaks Using Continuous Mutual Information. In Proceedings of the 2011 IEEE 24th Computer Security Foundations Symposium, Cernay-la-Ville, France, 27–29 June 2011; pp. 177–190. [Google Scholar]
Gilbert Goodwill, B.J.; Jaffe, J.; Rohatgi, P. A Testing Methodology for Side-Channel Resistance Validation. In NIST Non-Invasive Attack Testing Workshop; NIST: Gaithersburg, MD, USA, 2011; pp. 115–136. [Google Scholar]
Becker, G.T.; Cooper, J.; DeMulder, E.K.; Goodwill, G.; Jaffe, J.; Kenworthy, G.; Kouzminov, T.; Leiserson, A.J.; Marson, M.E.; Rohatgi, P.; et al. Test Vector Leakage Assessment (TVLA) Methodology in Practice. In Proceedings of the International Cryptographic Module Conference, Gaithersburg, MD, USA, 24–26 September 2013. [Google Scholar]
Bilgin, B.; Gierlichs, B.; Nikova, S.; Nikov, V.; Rijmen, V. Higher-order threshold implementations. In Proceedings of the Lecture Notes in Computer Science, Kaoshiung, Taiwan, 7–11 December 2014; Springer: Berlin/Heidelberg, Germany, 2014; pp. 326–343. [Google Scholar]
De Cnudde, T.; Bilgin, B.; Reparaz, O.; Nikov, V.; Nikova, S. Higher-order threshold implementation of the AES S-box. In Proceedings of the Smart Card Research and Advanced Applications: 14th International Conference, CARDIS 2015, Bochum, Germany, 4–6 November 2015; Springer: Berlin/Heidelberg, Germany, 2016; pp. 259–272. [Google Scholar]
Cohen, J. Statistical Power Analysis for the Behavioral Sciences; Routledge: Oxfordshire, UK, 1988. [Google Scholar]
Sawilowsky, S.S. New effect size rules of thumb. J. Mod. Appl. Stat. Methods 2009, 8, 597–599. [Google Scholar] [CrossRef]
Backes, M.; Dürmuth, M.; Gerling, S.; Pinkal, M.; Sporleder, C. Acoustic Side-Channel Attacks on Printers. In Proceedings of the 19th USENIX Security Symposium, Santa Clara, CA, USA, 14–16 August 2019. [Google Scholar]
Wang, Y.; Tang, M.; Wang, P.; Liu, B.; Tian, R. The Levene test based-leakage assessment. Integration 2022, 87, 182–193. [Google Scholar] [CrossRef]
Wagner, M. 700+ Attacks Published on Smart Cards: The Need for a Systematic Counter Strategy. In Proceedings of the Constructive Side-Channel Analysis and Secure Design—Third International Workshop, COSADE 2012, Darmstadt, Germany, 3–4 May 2012; Springer: Berlin/Heidelberg, Germany, 2012; pp. 33–38. [Google Scholar]
Bache, F.; Plump, C.; Güneysu, T. Confident Leakage Assessment—A Side-Channel Evaluation Framework Based on Confidence Intervals. In Proceedings of the Design, Automation & Test in Europe Conference & Exhibition (DATE), IEEE, Dresden, Germany, 19–23 March 2018; pp. 1117–1122. [Google Scholar]
Schneider, T.; Moradi, A. Leakage assessment methodology: Extended version. Cryptogr. Eng. 2016, 6, 85–99. [Google Scholar] [CrossRef]
Yaru, W.; Ming, T. Side channel leakage assessment with the Bartlett and multi-classes F-test. J. Commun. 2022, 42, 35–43. [Google Scholar]
Mangard, S. Hardware Countermeasures against DPA—A Statistical Analysis of Their Effectiveness. In Proceedings of the Topics in Cryptology–CT-RSA 2004: The Cryptographers’ Track at the RSA Conference 2004, San Francisco, CA, USA, 23–27 February 2004; Springer: Berlin/Heidelberg, Germany, 2004; pp. 222–235. [Google Scholar]
Skorobogatov, S. Synchronization method for SCA and fault attacks. J. Cryptogr. Eng. 2011, 1, 71–77. [Google Scholar] [CrossRef]
Oswald, D.; Paar, C. Improving Side-Channel Analysis with Optimal Linear Transforms. In Proceedings of the Smart Card Research and Advanced Applications: 11th International Conference, CARDIS 2012, Graz, Austria, 28–30 November 2012; pp. 219–233. [Google Scholar]
Merino Del Pozo, S.; Standaert, F.X. Blind source separation from single measurements using singular spectrum analysis. In Proceedings of the Cryptographic Hardware and Embedded Systems--CHES 2015: 17th International Workshop, Saint-Malo, France, 13–16 September 2015; pp. 42–43. [Google Scholar]
van Woudenberg, J.G.; Witteman, M.F.; Bakker, B. Improving Differential Power Analysis by Elastic Alignment. In Proceedings of the Topics in Cryptology–CT-RSA 2011: The Cryptographers’ Track at the RSA Conference 2011, San Francisco, CA, USA, 14–18 February 2011; Springer: Berlin/Heidelberg, Germany, 2011; pp. 104–119. [Google Scholar]
Li, J.; Siegmund, D. Higher criticism: P-values and criticism. Ann. Stat. 2015, 43, 1323–1350. [Google Scholar] [CrossRef]

Figure 1. The process of attacking-style assessment.

Figure 2. The process of leakage detection-style assessment.

Figure 3. The development of leakage detection-style assessment.

Figure 4. The optimization process of leakage assessment.

Figure 5. The process of hybrid side-channel testing.

Table 1. The optimizations of TVLA’s statistical tool.

Tool	For TVLA’s Shortcoming	The Comparison Result with t-Test
The paired t-test [34]	The environmental noise negatively affects the results of TVLA.	The paired t-test performs better than the t-test in a noisy environment.
$χ^{2}$ -test [35]	TVLA has only two classifications; the detection results rely on the mean statistical moment.	When the leakage does not occur on the mean statistical moment, the $χ^{2}$ -test is better than the t-test.
KS test [37]	The detection results rely on the mean statis tical moment.	When the leakage does not occur on the mean statistical moment or the statistical parameters are transformed, the KS test is more robust than the t-test.
Hotelling $T^{2}$ -test [36]	TVLA cannot be used for multivariate TVLA is based on an independence assumption.	For multivariate leakage, compared with the t-test, the Hotelling $T^{2}$ -test can improve the detection efficiency.
ANOVA test [23]	TVLA has only two groups.	When the traces are divided into more groups, the detection efficiency of the ANOVA test is better than the t-test.
DL-LA [14]	TVLA is not suitable for multivariate, horizontal leakage, and unaligned power traces.	For the multivariate, horizontal leakage, or unaligned power traces, DL-LA is better than the t-test.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, Y.; Tang, M. A Survey of Side-Channel Leakage Assessment. Electronics 2023, 12, 3461. https://doi.org/10.3390/electronics12163461

AMA Style

Wang Y, Tang M. A Survey of Side-Channel Leakage Assessment. Electronics. 2023; 12(16):3461. https://doi.org/10.3390/electronics12163461

Chicago/Turabian Style

Wang, Yaru, and Ming Tang. 2023. "A Survey of Side-Channel Leakage Assessment" Electronics 12, no. 16: 3461. https://doi.org/10.3390/electronics12163461

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Survey of Side-Channel Leakage Assessment

Abstract

1. Introduction

2. The Attacking-Style Assessment

2.1. The Assessment Process of Attacking-Style Assessment

2.2. The Methods of Attacking-Style Assessment

2.2.1. The Profiled Attack

2.2.2. The Non-Profiled Attack

2.3. The Metrics of Attacking-Style Assessment

2.4. The Advantages and Shortcomings of Attacking-Style Assessment

3. The Leakage Detection-Style Assessment

3.1. The Goals of Leakage Detection-Style Assessment

3.2. The Process of Leakage Detection-Style Assessment

3.3. The Development of Leakage Detection-Style Assessment

4. The Leakage Assessment Based on Statistical Hypothesis

4.1. The Test Vector Leakage Assessment

4.1.1. The TVLA Technology

4.1.2. The Assessment Metrics of TVLA

4.1.3. The Drawbacks of TVLA

4.2. The Optimizations of TVLA

4.2.1. The Optimization of the Statistical Tool

4.2.2. The Optimization of the Leakage Assessment Process

4.2.3. The Optimization of TVLA’s Decision Strategy

4.3. The Summary of TVLA’s Optimization Schemes

5. Quantification of Side Channel Vulnerability

6. Discussion

7. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI