Side-Channel Power Analysis Based on SA-SVM

Zhang, Ying; He, Pengfei; Gan, Han; Zhang, Hongxin; Fan, Pengfei

doi:10.3390/app13095671

Open AccessArticle

Side-Channel Power Analysis Based on SA-SVM

by

Ying Zhang

¹

,

Pengfei He

^1,*

,

Han Gan

²,

Hongxin Zhang

³ and

Pengfei Fan

¹

School of Physics and Electronic Information, Yantai University, Yantai 264005, China

²

School of Information and Electronic Engineering, Shandong Business and Technology University, Yantai 264005, China

³

School of Electronic Engineering, Beijing University of Posts and Telecommunications, Beijing 100088, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(9), 5671; https://doi.org/10.3390/app13095671

Submission received: 24 March 2023 / Revised: 28 April 2023 / Accepted: 2 May 2023 / Published: 4 May 2023

(This article belongs to the Special Issue New Advance in Electronic Information Security)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Support vector machines (SVMs) have been widely used in side-channel power analysis. The selection of the penalty factor and kernel parameter heavily influences how well support vector machines work. Setting reasonable SVM hyperparameters is a key issue in side-channel power analysis. The novel side-channel power analysis method SA-SVM, which combines simulated annealing (SA) and support vector machines (SVMs) to analyze the power traces and crack the key, is proposed in this paper as a solution to this issue. This method differs from other approaches in that it integrates SA and SVMs, enabling us to more effectively explore the search space and produce superior results. In this paper, we conducted experiments on SA-SVM and SVM models from three different aspects: the selection of kernel functions, the number of parameters, and the number of eigenvalues. To compare our results with previous research, we performed experimental evaluations on open datasets. The results indicate that, compared with the SVM model, the SA-SVM model improved the accuracy by 0.25% to 3.25% and reduced the required time by 39.96% to 98.02% when the point of interest was 53, recovering the key using only three power traces. The SA-SVM model outperforms existing methods in terms of accuracy and computation time.

Keywords:

side-channel power analysis; simulated annealing; support vector machine; kernel function

1. Introduction

When cryptographic algorithms are performed on electrical devices, the hardware circuit will leak out relevant physical information such as time [1], electromagnetic radiation [2], power consumption, optics [3], acoustics [4], etc. Unlike traditional cryptanalysis, side-channel analysis (SCA) avoids analyzing the cryptographic algorithm itself. SCA usually splits the master key into several subkeys, and the leaked physical information of these subkeys can be captured by the attacker. The attacker combines the relevant knowledge to recover the subkeys and eventually the master key.

Kocher et al. [5] suggested a power consumption attack for the first time in the late 1990s. It is a branch of side-channel attacks that targets devices by measuring power consumption. Kocher proposed that the classic differential power analysis (DPA) had successfully cracked the DES algorithm key. They found that there was a correlation between power consumption and data when the device was being encrypted. Additionally, the relation contains encrypted device key data that can be used to crack the key. By analyzing the power consumption of a device during encryption or decryption, it is possible to deduce the key used. In order to carry out this type of attack, a computer uses an encryption device and inputs a set of known plaintexts into the device for encryption. As the device performs encryption, an oscilloscope measures power consumption, thus obtaining power traces. Following this approach, more power analysis attack methods were developed, which can be broadly classified as profiling attacks and nonprofiling attacks. Nonprofiling attacks include mutual information analysis (MIA) [6] and collision attacks (CAs) [7], in addition to the commonly used DPA and correlation power analysis (CPA) [8]. Profiling attacks include template attacks (TAs) [9] and machine-learning-based SCA (MLSCA), such as multilayer perceptron (MLP) [10], random forests (RFs) [11,12], k-nearest neighbors (KNNs) [13], convolutional neural networks (CNNs) [14,15], and support vector machines (SVMs) [16,17,18,19,20]. Nonprofiling attacks are simple but vulnerable to environmental interference. In contrast, profiling attack techniques are more resilient to environmental noise because they require full control of a device that is identical to the target device. The attacker uses the owned device to create a side-channel leakage model based on a large number of samples, which allows for easier key cracking on the target device using this attack. Hospodar et al. [16] applied the LS-SVM model to power consumption analysis attacks for the first time. The findings demonstrated that the parameter selection of machine learning technology has a substantial effect on categorization performance. Heuser and Zohner et al. [17] extended the bit model to a Hamming weight (HW) model by assuming intermediate values to classify power consumption profiles and decreased spatial complexity, demonstrating that SVM-based attacks beat conventional template attacks in high-noise situations. HOU et al. [18] used an SVM based on wavelet kernels to recover the offset values and keys of a masked AES algorithm, showing that wavelet kernel vector machines beat Gaussian kernel vector machines. Picek and Heuser et al. [19] used the SMOTE algorithm to address the problem of unbalanced data during SVM training caused by the HW model. For feature extraction and selection of power traces, Tran et al. [20] combined variational mode decomposition with the Gram–Schmidt method to select eigenvalues. The method is capable of extracting features that retain the most important information from the power trace while reducing noise, which are then used for SVM classification. The above contributions show that SVMs outperform other MLSCA methods. However, the problem of parameter selection in SVMs still remains unsolved by their methods.

Heuristic algorithms are widely used in optimization problems. Several researchers have used heuristic algorithms for feature extraction. Wang et al. [21] proposed a framework of GA-CPA, which combined genetic algorithms (GAs) and CPA. This framework utilizes genetic algorithms to extract characteristic values, followed by a CPA attack. Wang et al. [22] described an intelligent neural network algorithm, which leverages particle swarm optimization (PSO) to detect hardware trojans. The experimental results indicate that the detection accuracy of the PSO neural network method surpasses that of conventional BP neural network methods. Several researchers have used heuristic algorithms for parameter optimization, which differs from traditional exact-solving methods in prioritizing search in the approximate solution space. Several heuristic algorithms have been studied to optimize SVM parameters, such as genetic algorithms (GAs) [23], particle swarm optimization (PSO) [24], ant colony optimization (ACO) [25], and simulated annealing (SA) [26,27]. These studies have shown improved classification accuracy compared with other methods such as grid search. The GA algorithm is close to the best solution, but it is difficult to encode the problem and then decode the solution. The PSO algorithm has fewer variables to tweak and a straightforward principle, but it has poor local search capability and insufficient search precision. The ACO method converges slowly and tends to fall into the local optimal. Moreover, ACO cannot handle continuous space optimization issues and is only suitable for discrete problems. Simulated annealing (SA) enables the discovery of maximum or minimum values by allowing for random selection of suboptimal solutions, making it easier to escape local optima.

In this study, SA and SVM methods were combined to establish an SA-SVM model, which was established and applied to the side-channel power analysis. The experiment was performed on the DPA Contest v4.1 public dataset. First, the Pearson coefficient was used to select the eigenvalues of the dataset of power traces. Then, the HW model was used as a label for the SA-SVM model. The SA-SVM model uses a certain probability of accepting negative increments to jump out of local optima and find optimal parameters more easily. We evaluated the performance of the parameters in terms of accuracy and guessing entropy. The results of the experiments show that, when compared with the grid search method, through simulated annealing, global optimization can be improved by jumping out of the local optimum.

2. Materials and Methods

2.1. Data Preprocessing

DPA Contest is a globally recognized standard competition in the field of cryptographic security, and the latest version is DPA Contest v4 [28]. Since complete encryption was not used in this experiment, DPA Contest v4.1 was selected as the dataset. We chose 1000 power traces from the DPA Contest v4.1 (DPAv4.1), with 435,000 feature values in each sample. Because the mask recovery phase was not considered in this study, we converted the dataset to an unprotected scheme with known masks. We decided to only target

H (k^{*})

, the HW output of the first mask S-box of the first round.

H (k^{*}) = H W (S b o x (P_{1} \oplus k^{*}) \oplus M_{k n o w n m a s k})

(1)

where

H \in \{c_{1}, c_{2}, \dots, c_{C}\}, C

is the number of classifications required.

P_{1}

is the first plaintext byte and

k^{*}

is the real subkey.

Not all power traces and encrypted data are correlated. Usually, the Pearson correlation coefficient is used to select those features that are related to encrypted data.

Pearson (x, y) = \frac{\sum_{i = 1}^{N} ((x_{i} - \bar{x}) (y_{i} - \bar{y}))}{\sqrt{\sum_{i = 1}^{N} {(x_{i} - \bar{x})}^{2}} \sqrt{\sum_{i = 1}^{N} {(y_{i} - \bar{y})}^{2}}}

(2)

We computed the absolute Pearson correlation between power traces sampling points and

H (k^{*})

. The correlation coefficient’s absolute value shows the degree of correlation, and the results are shown in Figure 1.

Figure 1 shows notable peaks that span from around 100,000 to between 200,000, and peaks observed at approximately 250,000 points. The points corresponding to the spikes are highly correlated with intermediate

H (k^{*})

. Power traces of DPAv4.1 have an extremely high correlation coefficient. The peak points shown in Figure 1 are listed in Table 1. There are 192 points of interest with absolute values of correlation coefficient larger than 0.5, and 4 points of interest with absolute values greater than 0.9. Having a fewer number of interest points (POIs) is not necessarily better. Although reducing the number of POIs speeds up computation and decreases information redundancy, it also results in the loss of useful information. Choosing the appropriate number can help us to crack the key more effectively.

2.2. Research Method

2.2.1. SVM Classifier

Vapnik et al. (1995) [29] proposed an SVM based on the VC dimensionality theory of statistical learning theory and the structural risk minimization (SRM) principle. SVM, a supervised learning machine learning method used to handle regression analysis and classification issues, sets the dataset as follows:

D_{M} = \{(X_{i}, y_{i}) ∣ X_{i} \in R^{n}, y_{i} \in \{- 1, + 1\}, i = 1, 2, \dots, M}

(3)

where

X_{i}

denotes the input sample,

n

is the dimensionality of the input sample, and

y_{i}

denotes the corresponding label of the sample. SVM’s fundamental model is a linear classifier for binary categorization issues. The idea is to map the input sample into higher-dimensional space. The distance between the two characteristics of the input sample is then optimized by establishing a maximum interval hyperplane in high-dimensional space. The larger the distance, the more effective the classification. The hyperplane is defined as follows:

f (X) = ω^{T} Φ (X) + b

(4)

where

Φ (X)

is the nonlinear mapping function,

ω

denotes the weight vector, and

b

denotes the offset vector.

\begin{matrix} \min_{ω, b, ξ} (\frac{1}{2} ∥ ω ∥^{2} + C \sum_{i = 1}^{n} ξ_{i}), \\ s . t . y_{i} (ω^{T} Φ (X_{i}) + b) \geq 1 - ξ_{i}, ξ_{i} \geq 0, i = 1, 2, 3, \dots, M \end{matrix}

(5)

where

C > 0

is a regularization parameter, also known as the penalty factor. It is used to control the degree of sample penalty. If C is too high, it will cause overlearning and affect the classifier’s generalization ability.

ξ_{i}

denotes the relaxation factor. The Lagrange multiplier is introduced to transform the constrained problem of Equation (5) into a pairwise problem for a better solution, and the resulting equation is as follows:

\begin{array}{c} \min_{α} \frac{1}{2} \sum_{i = 1}^{n} \sum_{j = 1}^{n} λ_{i} λ_{j} y_{i} y_{j} K (X_{i} \cdot X_{j}) - \sum_{i = 1}^{n} α_{i}, \\ s . t . \sum_{i = 1}^{n} λ_{i} y_{i} = 0, 0 \leq λ_{i} \leq C, i = 1, 2, \dots, M \end{array}

(6)

where

λ_{i}

is the Lagrange multiplier.

K (X_{i}, X_{j}) = Φ {(X_{i})}^{T} Φ (X_{j})

is the kernel function. Using the above formula, we can calculate

ω, b

as follows:

ω^{*} = \sum_{i = 1}^{M} λ_{i} y_{i} ϕ (X_{i})

(7)

b^{*} = y_{j} - \sum_{i = 1}^{M} λ_{i} y_{i} K (X_{i}, X_{j})

(8)

According to Equations (7) and (8), the final SVM classification decision function is obtained as follows:

f (X) = sign (\sum_{i = 1}^{M} λ_{i} y_{i} K (X_{i}, X) + b^{*})

(9)

The kernel function’s purpose in training classification models for SVMs is to convert the incoming nonlinear data into high-dimensional kernel space. The kernel function, in essence, precomputes the input data in the lower dimension of this display and shows the classification impact in the upper dimension of this display. SVMs can avoid the issue of processing data in high-dimensional space by introducing suitable kernel functions.

Five kernel functions were chosen for this article, among which the Morlet–RBF kernel was applied to side-channel power analysis for the first time. The Morlet–RBF kernel creates a mixed kernel function by combining the Morlet and RBF methods. It has a good performance in the field of image processing [30]. The five proposed kernels are as follows:

The linear kernel function (linear kernel):

K (x, x^{'}) = ⟨ x, x^{'} ⟩

(10)

The radial basis function (RBF kernel) [31]:

K (x, x^{'}) = \exp (- \frac{{|x - x^{'}|}^{2}}{2 σ^{2}})

(11)

The Morlet wavelet kernel function [32]:

K (x, x^{'}) = \prod_{i = 1}^{N} h (\frac{x_{i} - x_{i}^{'}}{σ}) = \prod_{i = 1}^{N} (\cos (ω_{0} \frac{x_{i} - x_{i}^{'}}{σ}) \exp (- \frac{∥ x_{i} - x_{i}^{'} ∥^{2}}{2 σ^{2}}))

(12)

The Mexican hat wavelet kernel function [33]:

K (x, x^{'}) = \prod_{i = 1}^{N} (\frac{2}{\sqrt{3}} π^{\frac{1}{4}} (1 - \frac{{|x - x^{'}|}^{2}}{σ^{2}}) \exp (- \frac{{|x - x^{'}|}^{2}}{2 σ^{2}}))

(13)

The mixed kernel function of Morlet and RBF (Morlet–RBF kernel):

K (x, x^{'}) = \exp (- γ (2 - 2 (K_{Morlet}))

(14)

2.2.2. Simulated Annealing Algorithm

SA is a heuristic algorithm for problem solving in optimization. It enables the simulation of the annealing process in the solid by first increasing the temperature of the solid to a sufficiently high temperature and then slowly cooling it, causing the system to reach equilibrium at each temperature. This process is controlled by the parameters of the cooling system. The search process of simulated annealing introduces a random factor. As the temperature decreases, the algorithm takes on a worse solution than the current solution with a certain probability based on the Metropolis criterion, and the solution is obtained at random across the neighborhood. These probabilistic movements eventually cause the system to jump out of the local optimum and eventually converge to the global optimum, thus effectively avoiding solutions with local minima.

SA can be divided into two cycles: inner loop and outer loop. The inner loop is used to simulate multiple state transfers at the same temperature. For a temperature

T_{i}

,

E

is the internal energy at that temperature,

x_{o l d}

is the current state, and

x_{n e w}

is the new state, which is sampled multiple times in the neighborhood. After several state transfers, the steady state at this temperature is reached. The condition for receiving new states—the Metropolis criterion—is stated as follows:

P (x_{o l d} \to x_{n e w}) = \{\begin{array}{l} 1 & E (x_{o l d}) - E (x_{n e w}) > 0 \\ e^{- \frac{E (x_{n e w}) - E (x_{o l d})}{T_{i}}} & E (x_{o l d}) - E (x_{n e w}) ⩽ 0 \end{array}

(15)

The outer loop is used as the algorithm termination criterion. Temperature

T_{i}

is reduced using the annealing algorithm, obtaining

T_{i + 1}

<

T_{i}

and

T_{i + 1}

. The above process is repeated until some stopping criterion is reached. The flowchart of SA is shown in Figure 2.

2.3. Model Evaluation

The accuracy and guessing entropy [34] of the model were used in this study to evaluate its performance.

Accuracy (Acc) is the proportion of properly classified predicted results to the total samples. The probability vector is the probability of the SVM output categorization. We describe the probability vector of power traces Q as

p r o b_{i} = [p_{i, c_{1}}, p_{i, c_{2}}, \dots, p_{i, c_{c}}], i = 1, 2, \dots, Q

.

\hat{H (k^{*})}

, which is the predicted classification and is given by

\hat{H (k^{*})} = \underset{\{c_{1}, \dots, c_{C}\}}{\arg m a x} p r o b_{i}

(16)

Accuracy is defined as follows:

A c c = \frac{1}{Q} \sum_{i = 1}^{Q} \{\begin{matrix} 0 & \hat{H (k^{*})} \neq H (k^{*}) \\ 1 & \hat{H (k^{*})} = H (k^{*}) \end{matrix}

(17)

Guessing entropy (GE) is used to determine how many power traces are required to decrypt the key. We define the vector of each guessing key

k

as follows:

g_{k} = \log \prod_{i = 1}^{N_{k}} \underset{\{c_{1}, \dots, c_{C}\}}{\underset{⏟}{p r o b_{i}}} = \sum_{i = 1}^{N_{k}} \log \underset{\{c_{1}, \dots, c_{C}\}}{\underset{⏟}{p r o b_{i}}}

(18)

where

k = 1, 2, \dots, K

,

K

denotes the key space size, and

N_{k}

is the number of power traces corresponding to the guessing key.

g_{k}

is used in descending order to obtain

G = [g [1], g [2], \dots, g [K]]

.

g [1]

is the candidate key with the greatest probability, and

g [K]

is the candidate key with the lowest probability. When the candidate key of

g [1]

is the real key, the guessing entropy is considered to be zero. The fewer the necessary power traces for the recovery of a single key byte, the better the model’s performance.

3. SVM Classifier Based on SA

Figure 3 depicts the side-channel power attack method based on SA-SVM. DPA Contest is a globally recognized standard competition in the field of cryptographic security, and the latest version is DPA Contest v4. Since complete encryption was not used in this experiment, DPA Contest v4.1 was selected as the dataset. In the profiling phase, the original dataset is preprocessed, including labeling, feature extraction, and selection. Data processing is explained in Section 2.1. The following kernel functions were chosen for training the SVM classifier: the linear kernel, the RBF kernel, the Morlet wavelet kernel, the Mexican hat wavelet kernel, and the Morlet–RBF kernel. The SA-SVM algorithm optimizes the penalty factor and kernel parameter. The SA-SVM model is trained using the preprocessed training data in the end stage of the profiled phase to model the power consumption of attacker-owned devices. In the attack phase, the trained SVM model determines the probability of the traces for classes

c_{0}, \dots, c_{8}

based on the test set. Finally, for each hypothetical value, its log-likelihood is summed up, and the key is predicted.

In addition to the Morlet–RBF kernel, SA-SVM defines a set of states for the kernel, where each state includes the penalty factor

c 1, c 2

and the kernel parameter

s i g m a 1

,

s i g m a 2

. For the Morlet–RBF kernel, a set of states includes the penalty factor

c 1, c 2

, and the kernel parameter

s i g m a 1

s i g m a 2, g a m m a 1 g a m m a 2

. The state-generating function of SA-SVM generates a set of neighbors in the search range. The discrepancy between the initial parameter value and the parameter value of its adjacent state is large at first, but it decreases as the algorithm iterates. At a certain temperature, each iteration will choose a neighbor state at random. If the selected adjacent state outperforms the current state, the state is adopted, and its parameter values (

c

and

s i g m a

) are replaced with new parameter values. Otherwise, the adjacent state is adopted with a probability. (Algorithm 1 requires 12–13 lines of code.)

The user needs to customize the parameters of the cooling system, including the length of the Markov chain

L

, the attenuation coefficient

ɑ

, the initial temperature, and the termination temperature

T_{0}

,

T_{e n d}

. The relevant settings are as follows:

(1)

Set the initial temperature

T_{0}

:

T_{0}

influences the global search capability. A higher

T_{0}

results in a more powerful, albeit more time-consuming, global search ability;

(2)

Set the length of the Markov chain

L

(internal loop): the more iterations there are at

T_{i}

, the more time-consuming the process becomes;

(3)

Set the temperature attenuation coefficient

ɑ

(external circulation):

Exponential attenuation method: $T_{i + 1} = ɑ T_{i}$ ;
Classical annealing method: $T_{i + 1} = T_{i} / \ln (1 + T_{i})$ ;
Fast annealing method: $T_{i + 1} = T_{i} / (1 + ɑ T_{i})$ .

(4)

Set the search range for the set of states: the initial solution has no effect on the final result, and it can be chosen at random in the solution set. The solution’s search range should be reasonable in relation to the actual issue;

(5)

Set the termination condition: when the number of iterations

L

is achieved, the inner loop is terminated. If the optimal solution obtained by consecutively cooling several times remains unchanged, or

T_{i}

drops to

T_{e n d}

, the external circulation stops. The specific process of the proposed SA-SVM is presented in Algorithm 1.

Algorithm 1: Pseudocode for SA-SVM

Input: parameters of cooling system

L

,

ɑ

,

T_{0}

,

T_{e n d}

Input:

c = [c 1, c 2]

,

s i g m a = [s i g m a 1, s i g m a 2], d a t a s e t

,

l a b e l

Input:

r a n g e = c 2 - c 1, r a n g e 2 = s i g m a 2 - s i g m a 1

Output:

c_{best}

,

s i g m a_{best}

,

A c c_{best}

1:

c \leftarrow c_{0} = r a n d (s i z e (1, 1)) . * r a n g e + c 2

2:

s i g m a \leftarrow s i g m a_{0} = r a n d (s i z e (1, 1)) . * r a n g e + s i g m a 2

3:

A c c = S A S V M (d a t a s e t, l a b e l, c_{0}, s i g m a_{0})

4:

A c c \leftarrow A c c_{best}

5: while

T_{i} > T_{e n d}

do

6: for

iteration = 1

to

L

do

7: //generate randomly a neighboring solution

8:

c_{n e w} = c + 0.1 . * r a n g e 1 . * (r a n d (s i z e (1, 1) - 0.5)

9:

s i g m a_{n e w} = s i g m a + 0.1 . * r a n g e 2 . * (r a n d (s i z e (1, 1) - 0.5)

10:

A c c N e w = S A S V M (d a t a s e t, l a b e l, c_{n e w}, s i g m a_{n e w})

11:

Δ f \leftarrow (1 - AccN e w) - (1 - Acc)

//Error as objective function

12: if

Δ f \leq 0

or random

(0, 1) < \exp (- \frac{Δ f}{KT})

then

13:

c \leftarrow c_{new}, s i g m a \leftarrow s i g m a_{n e w}, A c c \leftarrow A c c N e w

14:

c_{best} \leftarrow c

,

s i g m a_{best} \leftarrow s i g m a

15:

iteration \leftarrow iteration + 1

16: end if

17: end for

18:

T_{i + 1} \leftarrow

T_{i} / (1 + ɑ T_{i})

19: end while

20: return

c_{best}

,

s i g m a_{best}

,

A c c_{best}

4. Results and Analysis

In this work, the framework for conducting our attacks was supported by LIBSVM Library for SVM [35]. We ran the experiment on a laptop with Inter(R) Core (TM) i5-11320H CPU @ 3.20 GHz, 16 GB (Windows 11 x64). We implemented the experiments with Matlab2019b.

In this section, we compare the performance of the different kernel functions under the SVM model and the SA-SVM model. The SVM model uses a grid search method to find the optimal parameters. We chose 1000 power traces from DPAv4.1 for our experimental dataset. Two-thirds of them were used as the learning set, and the remaining one-third was used for the testing set. The learning set was partitioned into three equal subsets via three-fold cross-validation. We trained the model using two subsets at a time and used the remaining subset for validation, repeating this process until three subsets were used for validation. This process generated three models, and we chose the parameters of the model with the highest accuracy and evaluated its effectiveness on the testing set. The subkey at this point was 108. In addition, 4 to 117 points of interest related to leakage were used in our experiments. The proposed SA-SVM approach’s parameter values were as follows:

L

= 100;

T_{0}

= 100,

T_{end}

= 1;

c

= [0.01,128],

s i g m a

= [0.001,5],

g a m m a

= [0.001,5]. Because simulated annealing is a stochastic algorithm, the solution will oscillate with each training. The SA-SVM model was used to perform five calculations on the dataset, and the best-performing one was chosen as the final result. To explore the impact of kernel functions and eigenvalues on model performance, we used accuracy, guessing entropy, and the time required as evaluation criteria.

The data in Table 2 and Table 3 range from 4 to 117 POIs for the various kernel functions. First, we can observe that all kernel functions performed well in terms of accuracy. Table 2 lists the results of the SVM model, which indicate that the greater the number of POIs, the higher the accuracy of SVM. However, after exceeding 53 POIs, the increase in accuracy slowed down. When there were 53 POIs, the Morlet–RBF kernel performed best with 92% accuracy, followed by the Mexican hat wavelet kernel with 91.75%. There was little difference in accuracy between kernel functions. Table 3 lists the results of the SA-SVM model, which indicate that the number of POIs reached its peak at 53, and accuracy began to decline after that. The Morlet wavelet had the highest accuracy at peak, 94.5%, which was 3.5% higher than when using the SVM model. The accuracy of the Morlet–RBF kernel was 1.5–2% higher than other Morlet kernels when POI was less than 29. The comparison of Table 2 and Table 3 shows that, under the same kernel, the SA-SVM model had more significant advantages in classification than the SVM model, and the accuracy increased by 0.25–3.25%. Furthermore, the wavelet kernel outperformed the ordinary RBF kernel and linear function in terms of classification accuracy.

Figure 4 depicts the time taken to conduct the experiments mentioned above. The results show that as the number of POIs increased, the total time for each algorithm increased. When the number of POIs increased from 4 to 117, the time of SVM increased by 159–309%, whereas the time of SA-SVM increased by 170–862% times. When the number of POIs was fixed, for example, at 53 POIs, the least time-consuming was the linear kernel, which took 17.44 s, and the most time-consuming was RBF, which took 81.77 s for SA-SVM. For SVM, the fastest was the linear kernel with 78.69 s, while the slowest was the Morlet–RBF kernel, which took 2836.91 s. The running time of SA-SVM was 39.96–98.02% less than that of the SVM model. Additionally, the time spent by the different kernels in SA-SVM was not significantly different, whereas the time spent by the different kernels in SVM increased with the complexity of the core. This means that, while the number of POIs affects SA-SVM, the parameter search time is significantly reduced when compared to SVM.

The total time spent included the time it takes for the kernel to process data, the time it takes to train the parameters of the model, and the time it takes to test the trained model. A comparison of the Morlet–RBF kernel and the RBF kernel in SA-SVM revealed that the Morlet–RBF wavelet spent more time processing data than the RBF kernel, but the total time was less. This indicates that the Morlet–RBF kernel is faster in parameter optimization than the RBF kernel.

Figure 5a–f illustrate the guessing entropy results for POIs of 4, 53, and 117 when using 1000 traces in total. When there were four POIs, the kernel function required seven traces to recover the keys under SA-SVM and only six under SVM, in addition to the linear kernel. When there were 53 POIs, the wavelet kernel in the SA-SVM model only required 3 traces to recover the key, whereas the Morlet–RBF wavelet and Mexican hat wavelet kernels in the SVM model required 4 traces. The number of traces to recover the keys increased when there were 117 POIs. Under the SA-SVM model, the kernel function required at least five keys to recover, whereas it needed more than six keys to recover under the SVM model. These results show that SA-SVM slightly improved the unmasked AES algorithm’s key recovery ability. In addition, the performance of the wavelet kernel was better than that of the RBF kernel, and the performance of the linear kernel was the worst.

5. Discussion

It can be observed from Table 2 and Table 3 that the accuracy decreased when the number of interest points surpassed a certain threshold. The reason is that when the number of POIs is too large, redundant features increase data error, resulting in an ill-conditioning matrix. It is very difficult to solve the equations with an ill-conditioned matrix. For many machine learning models, the complexity sharply increases with the increase in the number of features, resulting in an increase in time and cost. In the application of these models, selecting the appropriate feature points can effectively reduce the extent of model training while increasing the classifier’s generalizability.

HOU et al. [18] compared the performance of an SVM model using an RBF kernel, a Fourier kernel, and a wavelet kernel under the same penalty factor. Although they used an SVM-based technique with a wavelet kernel for the first time in SCA, they did not compare the performance of the SVM that was based on these kernels with a model using optimal parameters. This paper provides additional evidence that the wavelet kernel indeed outperforms other kernels when SVMs are used in SCA. A question that arises is why does the Morel–RBF kernel work well with a small number of POIs? We believe that although mapping is performed twice to readily separate data in high dimensions, its effect is limited.

Table 4 outlines the results of a comparison of the accuracy of machine learning algorithms used in the DPAv4.1 dataset. Picek et al. [15] presented the results for the DPAv4.1 dataset when considering 50 of the most important features. It can be seen that Naive Bayes, MLP, XGBoost, RF, and CNN algorithms have poor accuracy in small datasets. Duan X et al. [12] used the SMOTE algorithm to solve data imbalance issues, improving the accuracy of the RF algorithm. Table 4 shows that when the number of datasets and POIs is comparatively small, the SA-SVM model outperforms other machine learning algorithms.

As shown in Figure 3, the Morlet–RBF wavelet kernel, which required the optimization of three parameters, took significantly longer in the SVM model than in the SA-SVM model. This is because grid search is essentially a violent search. Grid search will iterate through various parameter combinations to find the best set based on the set scoring mechanism. When the number of hyperparameters increases, its computational complexity will increase exponentially. This also proves that SA-SVM has better performance in optimizing more than two parameters.

However, our method also has the following shortcomings: in this study, we only carried out experiments on a single bypass power analysis dataset, and these experiments were not extended to other datasets.

6. Conclusions

This paper presents an SA-SVM model for side-channel power analysis. The approach seeks continuous decision variables and optimizes SVM parameter values to achieve superior classification results. Experiments were conducted on DPAv4.1, testing the model under various POIs and kernel functions. Compared with the SVM grid search method, SA-SVM improved accuracy by approximately 0.25–3.25% while reducing the running time by up to 98.02%. We compared the linear, RBF, and three kinds of wavelet kernels, and it was revealed that the wavelet kernels had higher accuracy than RBF while requiring only three traces to recover the key. The combination of SA and SVM provides a new method for addressing the parameter selection challenge in side-channel power analysis.

Author Contributions

Data curation, P.F.; methodology, Y.Z.; supervision, H.Z.; validation, P.H.; writing—original draft preparation, Y.Z.; writing—review and editing, H.G. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation Program of China (62201491, 62071057) and the Yantai City 2021 School-Land Integration Development Project Fund (1521001-WL21JY01).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Kocher, P.C. Timing Attacks on Implementations of Diffie-Hellman, RSA, DSS, and Other Systems. In Proceedings of the 16th Annual International Cryptology Conference (CRYPTO 96), Santa Barbara, CA, USA, 18–22 August 1996; pp. 104–113. [Google Scholar]
Wang, R.; Wang, H.; Dubrova, E. Far Field EM Side-Channel Attack on AES Using Deep Learning. In Proceedings of the 4th ACM Workshop on Attacks and Solutions in Hardware Security, online, 13 November 2020; pp. 35–44. [Google Scholar]
Ferrigno, J.; Hlaváč, M. When AES Blinks: Introducing Optical Side Channel. IET Inf. Secur. 2008, 2, 94. [Google Scholar] [CrossRef]
Genkin, D.; Shamir, A.; Tromer, E. Acoustic Cryptanalysis. J. Cryptol. 2017, 30, 392–443. [Google Scholar] [CrossRef]
Goos, G.; Hartmanis, J.; van Leeuwen, J.; Kocher, P.; Jaffe, J.; Jun, B. Differential Power Analysis. In Proceedings of the 19th Annual International Cryptology Conference (CRYPTO 99), Santa Barbara, CA, USA, 15–19 August 1999; pp. 388–397. [Google Scholar]
Gierlichs, B.; Batina, L.; Tuyls, P.; Preneel, B. Mutual Information Analysis. In Cryptographic Hardware and Embedded Systems—CHES 2008; Oswald, E., Rohatgi, P., Eds.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2008; Volume 5154, pp. 426–442. ISBN 978-3-540-85052-6. [Google Scholar]
Niu, Y.; Zhang, J.; Wang, A.; Chen, C. An Efficient Collision Power Attack on AES Encryption in Edge Computing. IEEE Access 2019, 7, 18734–18748. [Google Scholar] [CrossRef]
Han, J.; Kim, Y.-J.; Kim, S.-J.; Sim, B.-Y.; Han, D.-G. Improved Correlation Power Analysis on Bitslice Block Ciphers. IEEE Access 2022, 10, 39387–39396. [Google Scholar] [CrossRef]
Choudary, M.O.; Kuhn, M.G. Efficient, Portable Template Attacks. IEEE Trans. Inf. Forensic Secur. 2018, 13, 490–501. [Google Scholar] [CrossRef]
Golder, A.; Das, D.; Danial, J.; Ghosh, S.; Sen, S.; Raychowdhury, A. Practical Approaches Toward Deep-Learning-Based Cross-Device Power Side-Channel Attack. IEEE Trans. VLSI Syst. 2019, 27, 2720–2733. [Google Scholar] [CrossRef]
Picek, S.; Heuser, A.; Jovic, A.; Legay, A. Climbing Down the Hierarchy: Hierarchical Classification for Machine Learning Side-Channel Attacks. In Proceedings of the 9th International Conference on Cryptology in Africa (AFRICACRYPT 2017), Dakar, Senegal, 24–26 May 2017; pp. 61–78. [Google Scholar]
Duan, X.; Chen, D.; Fan, X.; Li, X.; Ding, D.; Li, Y. Research and Implementation on Power Analysis Attacks for Unbalanced Data. Secur. Commun. Netw. 2020, 2020, 1–10. [Google Scholar] [CrossRef]
Liu, J.; Zhang, S.; Luo, Y.; Cao, L. Machine Learning-Based Similarity Attacks for Chaos-Based Cryptosystems. IEEE Trans. Emerg. Top. Comput. 2021, 10, 824–837. [Google Scholar] [CrossRef]
Martinasek, Z.; Hajny, J.; Malina, L. Optimization of Power Analysis Using Neural Network. In Proceedings of the 10th IFIP WG 8.8/11.2 International Conference (CARDIS 2011), Leuven, Belgium, 14–16 September 2011; pp. 94–107. [Google Scholar]
Kubota, T.; Yoshida, K.; Shiozaki, M.; Fujino, T. Deep Learning Side-Channel Attack against Hardware Implementations of AES. Microprocess. Microsyst. 2021, 87, 103383. [Google Scholar] [CrossRef]
Hospodar, G.; Gierlichs, B.; De Mulder, E.; Verbauwhede, I.; Vandewalle, J. Machine Learning in Side-Channel Analysis: A First Study. J. Cryptogr. Eng. 2011, 1, 293–302. [Google Scholar] [CrossRef]
Heuser, A.; Zohner, M. Intelligent Machine Homicide. In Proceedings of the 10th International Workshop, COSADE 2019, Darmstadt, Germany, 3–5 April 2019; pp. 249–264. [Google Scholar]
Hou, S.; Zhou, Y.; Liu, H.; Zhu, N. Wavelet Support Vector Machine Algorithm in Power Analysis Attacks. Radioengineering 2017, 26, 890–902. [Google Scholar] [CrossRef]
Picek, S.; Heuser, A.; Jovic, A.; Bhasin, S.; Regazzoni, F. The Curse of Class Imbalance and Conflicting Metrics with Machine Learning for Side-Channel Evaluations. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2018, 2019, 209–237. [Google Scholar] [CrossRef]
Tran, N.Q.; Hur, J.; Nguyen, H.M. Effective Feature Extraction Method for SVM-Based Profiled Attacks. Comput. Inf. 2021, 40, 1108–1135. [Google Scholar] [CrossRef]
Wang, A.; Li, Y.; Ding, Y.; Zhu, L.; Wang, Y. Efficient Framework for Genetic Algorithm-Based Correlation Power Analysis. IEEE Trans. Inf. Forensics Secur. 2021, 16, 4882–4894. [Google Scholar] [CrossRef]
Wang, C.X.; Zhao, S.Y.; Wang, X.S.; Luo, M.; Yang, M. A Neural Network Trojan Detection Method Based on Particle Swarm Optimization. In Proceedings of the 14th International Conference on Solid-State and Integrated Circuit Technology (ICSICT), Qingdao, China, 31 October–3 November 2018; pp. 1–3. [Google Scholar]
Huang, C.-L.; Wang, C.-J. A GA-Based Feature Selection and Parameters Optimizationfor Support Vector Machines. Expert Syst. Appl. 2006, 31, 231–240. [Google Scholar] [CrossRef]
Lin, S.-W.; Ying, K.-C.; Chen, S.-C.; Lee, Z.-J. Particle Swarm Optimization for Parameter Determination and Feature Selection of Support Vector Machines. Expert Syst. Appl. 2008, 35, 1817–1824. [Google Scholar] [CrossRef]
Zhang, X.; Chen, X.; He, Z. An ACO-Based Algorithm for Parameter Optimization of Support Vector Machines. Expert Syst. Appl. 2010, 37, 6618–6628. [Google Scholar] [CrossRef]
Sartakhti, J.S.; Afrabandpey, H.; Saraee, M. Simulated Annealing Least Squares Twin Support Vector Machine (SA-LSTSVM) for Pattern Classification. Soft Comput. 2017, 21, 4361–4373. [Google Scholar] [CrossRef]
Yin, Z.; Zheng, J.; Huang, L.; Gao, Y.; Peng, H.; Yin, L. SA-SVM-Based Locomotion Pattern Recognition for Exoskeleton Robot. Appl. Sci. 2021, 11, 5573. [Google Scholar] [CrossRef]
DPA Contest V4. Available online: https://www.dpacontest.org/v4/rsm_doc.php (accessed on 20 March 2023).
Cortes, C.; Vapnik, V. Support-Vector Networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
Jiang, H.; Liu, X.; Zhou, L.; Fujita, H.; Zhou, X. Morlet-RBF SVM model for medical images classification. In Proceedings of the 8th International Symposium on Neural Networks (ISNN 2011), Guilin, China, 29 May–1 June 2011; pp. 121–129. [Google Scholar]
Scholkopf, B.; Sung, K.K.; Burges, C.J.C.; Girosi, F.; Niyogi, P.; Poggio, T.; Vapnik, V. Comparing Support Vector Machines with Gaussian Kernels to Radial Basis Function Classifiers. IEEE Trans. Signal Process. 1997, 45, 2758–2765. [Google Scholar] [CrossRef]
Zhang, L.; Zhou, W.; Jiao, L. Wavelet Support Vector Machine. IEEE Trans. Syst. Man Cybern. B 2004, 34, 34–39. [Google Scholar] [CrossRef] [PubMed]
Tolambiya, A.; Venkatraman, S.; Kalra, P.K. Content-Based Image Classification with Wavelet Relevance Vector Machines. Soft Comput. 2010, 14, 129–136. [Google Scholar] [CrossRef]
Standaert, F.-X.; Malkin, T.G.; Yung, M. A Unified Framework for the Analysis of Side-Channel Key Recovery Attacks. In Proceedings of the 28th Annual International Conference on the Theory and Applications of Cryptographic Techniques (EUROCRYPT 2009), Cologne, Germany, 26–30 April 2009; pp. 443–461. [Google Scholar]
Chang, C.-C.; Lin, C.-J. LIBSVM: A Library for Support Vector Machines. ACM Trans. Intell. Syst. Technol. 2011, 2, 1–27. [Google Scholar] [CrossRef]

Figure 1. Raw traces and Pearson correlation coefficients.

Figure 2. Flowchart of SA.

Figure 3. The side-channel power attack process based on SA-SVM.

Figure 4. Computational time analysis (in second) of SVM and SA-SVM.

Figure 5. Guessing entropy for SA-SVM and SVM; (a) guessing entropy, SA-SVM, 4 POIs; (b) guessing entropy, SVM, 4 POIs; (c) guessing entropy, SA-SVM, 53 POIs (d) guessing entropy, SVM, 53 POIs; (e) guessing entropy, SA-SVM, 117 POIs; (f) guessing entropy, SVM, 117 POIs.

Table 1. The absolute values of correlation and their number of POIs.

Correlation (Absolute Value)	R ≥ 0.90	R ≥ 0.85	R ≥ 0.80	R ≥ 0.75	R ≥ 0.70	R ≥ 0.65	R ≥ 0.60
POIs	4	14	20	29	53	84	117

Table 2. Testing results (accuracy) of SVM.

Kernel	Accuracy (%) of Different POIs
Kernel	4	14	20	29	53	84	117
Linear	77.00	83.75	87.00	84.75	91.25	90.00	91.00
RBF	76.50	84.00	87.25	88.50	91.00	91.75	92.00
Morlet wavelet	76.75	84.75	88.50	89.25	91.25	91.50	92.25
Mexican hat wavelet	75.75	83.75	87.75	87.75	91.75	90.25	91.25
Morlet–RBF wavelet	77.00	85.25	87.00	88.50	92.00	91.50	91.25

Table 3. Testing results (accuracy) of SA-SVM.

Kernel	Accuracy * (%) of Different POIs
Kernel	4	14	20	29	53	84	117
Linear	77.75	84.00	87.75	87.25	91.00	91.00	91.75
RBF	78.50	86.75	88.50	90. 05	93.75	92.00	92.75
Morlet wavelet	78.00	86.75	88.50	91.00	94.50	91.75	92.25
Mexican hat wavelet	78.00	85.25	88.25	89.25	92.75	91.25	91.25
Morlet–RBF wavelet	79.50	87.25	89.00	91.00	93.25	91.25	92.25

* The highest classification accuracy rate on SA-SVM. The total time spent includes the time for kernel processing data, training model parameters, and test set in model training. Comparing the Morlet–RBF kernel and the RBF kernel in SA-SVM, the former spends more time processing data than the latter, but the total time is less, indicating that the Morlet–RBF kernel is quicker in optimizing parameters than the RBF kernel.

Table 4. Machine learning, DPAv4.1.

Ref.	MLSCA	No. of Traces	No. of POIs	Acc (%)
[14]	NB	1000	50	37.6
[14]	MLP	1000	50	44.8
[14]	XGBoost	1000	50	52.0
[14]	RF	1000	50	49.2
[14]	CNN	1000	50	60.4
[12]	RF_SMOTE	2000	-	93.0
This study	SA-SVM	1000	53	94.5

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, Y.; He, P.; Gan, H.; Zhang, H.; Fan, P. Side-Channel Power Analysis Based on SA-SVM. Appl. Sci. 2023, 13, 5671. https://doi.org/10.3390/app13095671

AMA Style

Zhang Y, He P, Gan H, Zhang H, Fan P. Side-Channel Power Analysis Based on SA-SVM. Applied Sciences. 2023; 13(9):5671. https://doi.org/10.3390/app13095671

Chicago/Turabian Style

Zhang, Ying, Pengfei He, Han Gan, Hongxin Zhang, and Pengfei Fan. 2023. "Side-Channel Power Analysis Based on SA-SVM" Applied Sciences 13, no. 9: 5671. https://doi.org/10.3390/app13095671

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Side-Channel Power Analysis Based on SA-SVM

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Preprocessing

2.2. Research Method

2.2.1. SVM Classifier

2.2.2. Simulated Annealing Algorithm

2.3. Model Evaluation

3. SVM Classifier Based on SA

4. Results and Analysis

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI