A Quantitative Analysis of Non-Profiled Side-Channel Attacks Based on Attention Mechanism
Abstract
:1. Introduction
1.1. Our Contributions
- 1.
- Introducing a novel neural network architecture integrated with the attention mechanismIn this paper, a novel neural network architecture for non-profiled SCA is presented, named non-profiled attention-based side-channel attack (NASCA). The NASCA architecture comprises three main components. The first component is a feed-forward neural network, which is implemented by CNN and LSTM networks in this study. The second component is an attention network, which effectively captures features. The last component consists of fully connected (FC) layers. With this innovative architecture, the information embedded in the long-term input traces is effectively captured, leading to successful side-channel attacks.
- 2.
- Introducing a novel attention metric of non-profiled DLSCAA novel metric based on the attention score vector is proposed, introducing quantitative analysis to non-profiled DLSCA, which offers more accurate results compared to traditional metrics. During the training process for each key, the attention mechanism calculates the attention score vector based on the feature vectors generated by the input network. For the correct key, the elements within the attention score vector exhibit a relatively large statistical dispersion, which can be quantified by the standard deviation. Therefore, the mean standard deviation of the attention score vector is adopted as the new metric. Through experiments on various datasets, correct keys can be consistently distinguished by this metric, even when traditional metrics fail to provide correct results.
- 3.
- Demonstrating the robustness of the proposed architecture and metricTo evaluate the robustness of the novel architecture and indicators, experiments were conducted on various datasets with different SCA countermeasures, such as datasets with only masks (ASCAD), and desynchronized datasets (AES_RD), as well as datasets with masking and desynchronization countermeasures (ASCAD_desync50). In addition, the proposed architecture was applied to the power traces with additional Gaussian noise. Despite the presence of various countermeasures or noise, the novel architecture and metrics successfully completed the attacks, demonstrating the robustness in different SCA scenarios.
1.2. Paper Organization
2. Preliminaries
2.1. Aes-128 Encryption
2.2. Deep Learning-Based Non-Profiled Side-Channel Attack
Algorithm 1 Differential Deep Learning Analysis (DDLA). |
Input: N power traces with corresponding plaintexts , an neural network Net, number of epochs Ne, substitution box Sbox |
|
2.3. CNN
2.4. LSTM
3. Proposed Attention-Based Architecture and Metrics for Non-Profiled Attacks
3.1. Proposed Non-Profiled Attack Process with Attention Metrics
Algorithm 2 Proposed non-profiled attention-based side-channel attack (NASCA). |
Input: N power traces with corresponding plaintexts , an attention- based network ANet, number of epochs Ne, substitution box Sbox |
|
3.2. Proposed NASCA Based on CNN Architecture
3.2.1. Architecture of NASCA-CNN
- 1.
- The first component of the architecture comprises stacked convolutional layers and pooling layers. Specifically, three convolutional layers are employed to extract features from input traces, and each of them is followed by an average pooling layer. The first convolutional layer consists of four filters with a size of 32 and a stride of 1, followed by pooling layers with a size of 2. The second convolutional layer consists of eight filters with a size of 16 and a stride of 1. Subsequently, pooling layers with a size of 4 are applied. The final layer employs 16 filters with a size of 8 and a stride of 1. It is then followed by pooling layers with a size of 8.
- 2.
- The second part is the attention network, which evaluates the weight of each feature vector generated by the CNN. The CNN has an input size of 700, resulting in an output size of (16, 9). To calculate the attention weights, the feature vectors are multiplied by their respective weights, leading to a weighted sum output of size 16. Let denote the feature vector produced by CNN, denote the trainable parameter vector, denote the attention score vector and be the output generated by the weighted sum operation:
- 3.
- The third part consists of two FC layers. As a result of the attention network, the dimension of the data input to these FC layers is reduced to 16. The first FC layer is composed of 16 input neurons and 8 output neurons, with rectified linear unit (ReLU) activation. The ReLU activation function introduces non-linearity and helps in capturing complex relationships within the data. The second FC layer is composed of 8 input neurons and 2 output neurons, utilizing SoftMax activation. The SoftMax activation function is commonly used in multi-class classification tasks, as it produces a probability distribution over the classes, allowing for the selection of the most probable class based on the network’s outputs.
3.2.2. The Working Mechanism of NASCA-CNN
3.3. Proposed NASCA Based on LSTM Architecture
3.3.1. Architecture of NASCA-LSTM
- 1.
- The first part is the non-bidirectional LSTM layer with one hidden layer, whose input size is 70, hidden size is 64 and sequence length is 10. For non-bidirectional LSTM, the hidden size equals the output size.
- 2.
- The second component is the attention network, which evaluates the importance of each feature vector generated by the LSTM. The attention network takes inputs with a size of 64, with an input sequence length of 10, and generates an output with a size of 64. The attention network multiplies the weights of the feature vectors to calculate a weighted sum output. Let denote the feature vector produced by LSTM, denote the trainable parameter vector, denote the attention score vector and be the output generated by the weighted sum operation:
- 3.
- The third component consists of two FC layers. Thanks to the attention network, the dimension of the data input to these FC layers is reduced. The first layer comprises 64 input neurons and 32 output neurons, employing ReLU activation. Subsequently, the second layer consists of 32 input neurons and two output neurons, utilizing SoftMax activation.
3.3.2. The Working Mechanism of NASCA-LSTM
4. Experimental Results
4.1. Datasets
- : The power traces in the ASCAD dataset were collected from a first-order protected software AES encryption running on an 8-bit ATMega8515 board. The ASCAD dataset consists of a profiling set of 50,000 traces, and an attack set of 10,000 traces. Each trace contains 700 samples associated with the SubBytes operation of the third byte during the first round of the AES-128 encryption process. The 16-byte key is the same for all traces, while the plaintexts are randomly generated. Therefore, both the profiling set and the attack set can be used for non-profiled attacks. In this paper, the profiling set is chosen to apply NASCA.
- : This dataset was obtained by introducing a misalignment of the samples in the ASCAD dataset, with a maximum window of 50. This dataset also contains 700 samples and the same number of power traces as the ASCAD dataset.
- : Regarding this dataset, the AES encryption algorithm is implemented on a 8-bit Atmel AVR microcontroller with an encryption key of 0x2b7e151628aed2a6abf7158809cf4f3c. The random delay countermeasure proposed by Coron and Kizhvatov in [13] is implemented as the protection mechanism. This dataset consists of 50,000 power traces, and each trace contains 3500 samples, which are compressed by selecting 1 sample (peak) of each CPU clock cycle.
4.2. Performance of NASCA on ASCAD
4.3. Performance of NASCA on ASCAD _desync50
4.4. Performance of NASCA on ASCAD with Additional Noise
4.5. Performance of NASCA on AES_RD
4.6. Performance Evaluation
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Kocher, P.C. Timing attacks on implementations of Diffie-Hellman, RSA, DSS, and other systems. In Proceedings of the Advances in Cryptology—CRYPTO’96: 16th Annual International Cryptology Conference, Santa Barbara, CA, USA, 18–22 August 1996; Springer: Berlin/Heidelberg, Germany, 1996; pp. 104–113. [Google Scholar]
- Chari, S.; Rao, J.R.; Rohatgi, P. Template attacks. In Revised Papers 4, Proceedings of the Cryptographic Hardware and Embedded Systems-CHES 2002: 4th International Workshop, Redwood Shores, CA, USA, 13–15 August 2002; Springer: Berlin/Heidelberg, Germany, 2003; pp. 13–28. [Google Scholar]
- El Aabid, M.A.; Guilley, S.; Hoogvorst, P. Template Attacks with a Power Model. IACR Cryptology ePrint Archive. 2007. Available online: https://eprint.iacr.org/2007/443 (accessed on 1 June 2023).
- Schindler, W.; Lemke, K.; Paar, C. A stochastic model for differential side channel cryptanalysis. In Proceedings of the Cryptographic Hardware and Embedded Systems—CHES 2005: 7th International Workshop, Edinburgh, UK, 29 August–1 September 2005; Springer: Berlin/Heidelberg, Germany, 2005; pp. 30–46. [Google Scholar]
- Mangard, S. A simple power-analysis (SPA) attack on implementations of the AES key expansion. In Proceedings of the Information Security and Cryptology—ICISC 2002: 5th International Conference, Seoul, Republic of Korea, 28–29 November 2002; Springer: Berlin/Heidelberg, Germany, 2003; pp. 343–358. [Google Scholar]
- Kocher, P.; Jaffe, J.; Jun, B. Differential power analysis. In Proceedings of the Advances in Cryptology—CRYPTO’99: 19th Annual International Cryptology Conference, Santa Barbara, CA, USA, 15–19 August 1999; Springer: Berlin/Heidelberg, Germany, 1999; pp. 388–397. [Google Scholar]
- Brier, E.; Clavier, C.; Olivier, F. Correlation power analysis with a leakage model. In Proceedings of the Cryptographic Hardware and Embedded Systems—CHES 2004: 6th International Workshop, Cambridge, MA, USA, 11–13 August 2004; Proceedings 6. Springer: Berlin/Heidelberg, Germany, 2004; pp. 16–29. [Google Scholar]
- Maghrebi, H.; Portigliatti, T.; Prouff, E. Breaking cryptographic implementations using deep learning techniques. In Proceedings of the Security, Privacy, and Applied Cryptography Engineering: 6th International Conference, SPACE 2016, Hyderabad, India, 14–18 December 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 3–26. [Google Scholar]
- Maghrebi, H. Deep Learning Based Side Channel Attacks in Practice. Cryptology ePrint Archive 2019. Available online: https://eprint.iacr.org/2019/578 (accessed on 1 June 2023).
- Lerman, L.; Markowitch, O. Efficient profiled attacks on masking schemes. IEEE Trans. Inf. Forensics Secur. 2018, 14, 1445–1454. [Google Scholar] [CrossRef] [Green Version]
- Nassar, M.; Souissi, Y.; Guilley, S.; Danger, J.L. RSM: A small and fast countermeasure for AES, secure against 1st and 2nd-order zero-offset SCAs. In Proceedings of the 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE), Dresden, Germany, 12–16 March 2012; pp. 1173–1178. [Google Scholar]
- Veyrat-Charvillon, N.; Medwed, M.; Kerckhof, S.; Standaert, F.X. Shuffling against side-channel attacks: A comprehensive study with cautionary note. In Proceedings of the Advances in Cryptology—ASIACRYPT 2012: 18th International Conference on the Theory and Application of Cryptology and Information Security, Beijing, China, 2–6 December 2012; Springer: Berlin/Heidelberg, Germany, 2012; pp. 740–757. [Google Scholar]
- Coron, J.S.; Kizhvatov, I. An efficient method for random delay generation in embedded software. In Proceedings of the Cryptographic Hardware and Embedded Systems—CHES 2009: 11th International Workshop, Lausanne, Switzerland, 6–9 September 2009; Springer: Berlin/Heidelberg, Germany, 2009; pp. 156–170. [Google Scholar]
- Dao, B.A.; Hoang, T.T.; Le, A.T.; Tsukamoto, A.; Suzaki, K.; Pham, C.K. Correlation Power Analysis Attack Resisted Cryptographic RISC-V SoC With Random Dynamic Frequency Scaling Countermeasure. IEEE Access 2021, 9, 151993–152014. [Google Scholar] [CrossRef]
- Jin, S.; Kim, S.; Kim, H.; Hong, S. Recent advances in deep learning-based side-channel analysis. ETRI J. 2020, 42, 292–304. [Google Scholar] [CrossRef] [Green Version]
- Timon, B. Non-profiled deep learning-based side-channel attacks with sensitivity analysis. Iacr Trans. Cryptogr. Hardw. Embed. Syst. 2019, 2019, 107–131. [Google Scholar] [CrossRef]
- Xiangliang, M.; Bing, L.; Hong, W.; Di, W.; Lizhen, Z.; Kezhen, H.; Xiaoyi, D. Non-profiled Deep-Learning-Based Power Analysis of the SM4 and DES Algorithms. Chin. J. Electron. 2021, 30, 500–507. [Google Scholar] [CrossRef]
- Lu, X.; Zhang, C.; Gu, D. Attention-Based Non-Profiled Side-Channel Attack. In Proceedings of the 2021 Asian Hardware Oriented Security and Trust Symposium (AsianHOST), Shanghai, China, 16–18 December 2021; pp. 1–6. [Google Scholar]
- Do, N.T.; Hoang, V.P.; Doan, V.S. Performance analysis of non-profiled side channel attacks based on convolutional neural networks. In Proceedings of the 2020 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS), Ha Long, Vietnam, 8–10 December 2020; pp. 66–69. [Google Scholar]
- Raffel, C.; Ellis, D.P. Feed-forward networks with attention can solve some long-term memory problems. arXiv 2015, arXiv:1512.08756. [Google Scholar]
- Peng, H.; Pappas, N.; Yogatama, D.; Schwartz, R.; Smith, N.A.; Kong, L. Random feature attention. arXiv 2021, arXiv:2103.02143. [Google Scholar]
- Kuroda, K.; Fukuda, Y.; Yoshida, K.; Fujino, T. Practical aspects on non-profiled deep-learning side-channel attacks against AES software implementation with two types of masking countermeasures including RSM. In Proceedings of the 5th Workshop on Attacks and Solutions in Hardware Security, Virtual Event, Republic of Korea, 19 November 2021; pp. 29–40. [Google Scholar]
- Do, N.T.; Hoang, V.P.; Doan, V.S. A novel non-profiled side channel attack based on multi-output regression neural network. J. Cryptogr. Eng. 2023, 1–13. [Google Scholar] [CrossRef]
- Dol, N.T.; Le, P.C.; Hoang, V.P.; Doan, V.S.; Nguyen, H.G.; Pham, C.K. Mo-dlsca: Deep learning based non-profiled side channel analysis using multi-output neural networks. In Proceedings of the 2022 International Conference on Advanced Technologies for Communications (ATC), Hanoi, Vietnam, 20–22 October 2022; pp. 245–250. [Google Scholar]
- Won, Y.S.; Han, D.G.; Jap, D.; Bhasin, S.; Park, J.Y. Non-profiled side-channel attack based on deep learning using picture trace. IEEE Access 2021, 9, 22480–22492. [Google Scholar] [CrossRef]
- Daemen, J.; Rijmen, V. Reijndael: The advanced encryption standard. Dr. Dobb’S J. Softw. Tools Prof. Program. 2001, 26, 137–139. [Google Scholar]
- Rioja, U.; Batina, L.; Flores, J.L.; Armendariz, I. Auto-tune POIs: Estimation of distribution algorithms for efficient side-channel analysis. Comput. Netw. 2021, 198, 108405. [Google Scholar] [CrossRef]
- Rolnick, D.; Veit, A.; Belongie, S.; Shavit, N. Deep learning is robust to massive label noise. arXiv 2017, arXiv:1705.10694. [Google Scholar]
- Khan, S.; Rahmani, H.; Shah, S.A.A.; Bennamoun, M. A guide to convolutional neural networks for computer vision. Synth. Lect. Comput. Vis. 2018, 8, 1–207. [Google Scholar]
- Kayhan, O.S.; Gemert, J.C.v. On translation invariance in cnns: Convolutional layers can exploit absolute spatial location. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 14274–14285. [Google Scholar]
- Cagli, E.; Dumas, C.; Prouff, E. Convolutional neural networks with data augmentation against jitter-based countermeasures: Profiling attacks without pre-processing. In Proceedings of the Cryptographic Hardware and Embedded Systems—CHES 2017: 19th International Conference, Taipei, Taiwan, 25–28 September 2017; Springer: Berlin/Heidelberg, Germany, 2017; pp. 45–68. [Google Scholar]
- Hua, W.; Zhang, Z.; Suh, G.E. Reverse engineering convolutional neural networks through side-channel information leaks. In Proceedings of the 55th Annual Design Automation Conference, San Francisco, CA, USA, 24–29 June 2018; pp. 1–6. [Google Scholar]
- Li, Y.; Zeng, J.; Shan, S.; Chen, X. Occlusion aware facial expression recognition using CNN with attention mechanism. IEEE Trans. Image Process. 2018, 28, 2439–2450. [Google Scholar] [CrossRef] [PubMed]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Elman, J.L. Finding structure in time. Cogn. Sci. 1990, 14, 179–211. [Google Scholar] [CrossRef]
- Whitnall, C.; Oswald, E.; Standaert, F.X. The Myth of Generic DPA… and the Magic of Learning. In Proceedings of the Topics in Cryptology—CT-RSA 2014: The Cryptographer’s Track at the RSA Conference 2014, San Francisco, CA, USA, 25–28 February 2014; Springer: Berlin/Heidelberg, Germany, 2014; Volume 8366, pp. 183–205. [Google Scholar]
- Lu, X.; Zhang, C.; Cao, P.; Gu, D.; Lu, H. Pay attention to raw traces: A deep learning architecture for end-to-end profiling attacks. Iacr Trans. Cryptogr. Hardw. Embed. Syst. 2021, 2021, 235–274. [Google Scholar] [CrossRef]
- Benadjila, R.; Prouff, E.; Strullu, R.; Cagli, E.; Dumas, C. Deep learning for side-channel analysis and introduction to ASCAD database. J. Cryptogr. Eng. 2020, 10, 163–188. [Google Scholar] [CrossRef]
Dataset | ASCAD | ASCAD_desync50 | AES_RD |
---|---|---|---|
Traces | 60,000 | 60,000 | 50,000 |
Samples | 700 | 700 | 3500 |
Countermeasures | Boolean Mask | Boolean Mask | Boolean Mask and Random Delay |
Target Byte | Byte 3 | Byte 3 | All Bytes |
Metric | Loss | Accuracy | Proposed Metric | |
---|---|---|---|---|
Dataset | ||||
ASCAD | Succeed | Succeed | Succeed | |
ASCAD_desync50 | Fail | Fail | Succeed | |
ASCAD with 0.2 noise | Succeed | Succeed | Succeed | |
ASCAD with 0.5 noise | Fail in LSTM | Fail in LSTM | Succeed | |
AES_RD | Fail in LSTM | Fail in LSTM | Succeed |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Pu, K.; Dang, H.; Kong, F.; Zhang, J.; Wang, W. A Quantitative Analysis of Non-Profiled Side-Channel Attacks Based on Attention Mechanism. Electronics 2023, 12, 3279. https://doi.org/10.3390/electronics12153279
Pu K, Dang H, Kong F, Zhang J, Wang W. A Quantitative Analysis of Non-Profiled Side-Channel Attacks Based on Attention Mechanism. Electronics. 2023; 12(15):3279. https://doi.org/10.3390/electronics12153279
Chicago/Turabian StylePu, Kangran, Hua Dang, Fancong Kong, Jingqi Zhang, and Weijiang Wang. 2023. "A Quantitative Analysis of Non-Profiled Side-Channel Attacks Based on Attention Mechanism" Electronics 12, no. 15: 3279. https://doi.org/10.3390/electronics12153279
APA StylePu, K., Dang, H., Kong, F., Zhang, J., & Wang, W. (2023). A Quantitative Analysis of Non-Profiled Side-Channel Attacks Based on Attention Mechanism. Electronics, 12(15), 3279. https://doi.org/10.3390/electronics12153279