Next Article in Journal
Excitation Intensity and Temperature-Dependent Performance of InGaN/GaN Multiple Quantum Wells Photodetectors
Next Article in Special Issue
Brick Assembly Networks: An Effective Network for Incremental Learning Problems
Previous Article in Journal
Robust Image Classification with Cognitive-Driven Color Priors
Previous Article in Special Issue
Adaptive Wiener Filter and Natural Noise to Eliminate Adversarial Perturbation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Efficient Implementation of a Crypto Library Using Web Assembly

1
Department of Information Security, Cryptology, and Mathematics, Kookmin University, Seoul 02707, Korea
2
Department of Financial Information Security, Kookmin University, Seoul 02707, Korea
*
Author to whom correspondence should be addressed.
Electronics 2020, 9(11), 1839; https://doi.org/10.3390/electronics9111839
Submission received: 16 September 2020 / Revised: 23 October 2020 / Accepted: 26 October 2020 / Published: 3 November 2020
(This article belongs to the Special Issue Recent Advances in Cryptography and Network Security)

Abstract

:
We implement a cryptographic library using Web Assembly. Web Assembly is expected to show better performance than Javascript. The proposed library provides comprehensive algorithm sets including revised CHAM, Hash Message Authentication Code (HMAC), and ECDH using the NIST P-256 curve to provide confidentiality, data authentication, and key agreement functions. To optimize the performance of revised CHAM in the proposed library, we apply an existing method that is a four-round combining method and additionally propose the precomputation method to CHAM-64/128. The proposed revised CHAM showed an approximate 2.06 times (CHAM-64/128), approximate 2.13 times (CHAM-128/128), and approximate 2.63 times (CHAM-128/256) performance improvement in Web Assembly compared to JavaScript. In addition, CHAM-64/128 applying the precomputation method showed an improved performance by approximately 1.2 times more than the existing CHAM-64/128. For the ECDH using P-256 curve, the naive implementation of ECDH is vulnerable to side-channel attacks (SCA), e.g., simple power analysis (SPA), and timing analysis (TA). Thus, we apply an SPA and TA resistant scalar multiplication method, which is a core operation in ECDH. We present atomic block-based scalar multiplication by revising the previous work. Existing atomic blocks show a performance overhead of 55%, 23%, and 37%, but atomic blocks proposed to use only P = ( X , Y , Z ) show 18%, 6%, and 11% performance overhead. The proposed Web Assembly-based crypto library provides enhanced performance and resistance against SCA thus, it can be used in various web-based applications.

1. Introduction

Recently, various types of Internet technology services, e.g., personal and business services, are provided to users via web-based applications due to the accessibility of the web. Typically, web-based applications comprised of servers and clients, and private information, e.g., private user data and passwords, are exchanged between clients and servers. Data transmitted in plaintext form are vulnerable to attackers thus, it is necessary to provide cryptographic operations to protect private data and build secure web-based services. In other words, data confidentiality, data authentication, and key establishment functions must be provided to develop secure web-based services [1].
JavaScript is a cross-platform script programming language that is used in various fields, e.g., server-side network programming, databases, and the Internet of Things (IoT) [2]. JavaScript is used in web browsers to display web sites and can be accessed from another application’s built-in objects. However, JavaScript is an interpreted language and is relatively slower than native languages such as, e.g., C. In addition, it does not support the mathematical operations required for cryptographic operations, which incurs heavy overhead when executing such cryptographic operations. Therefore, various web browser development companies have developed Web Assembly, which is a low-level language for web environments that provides performance that is similar to native languages (Web Assembly is being continuously extended) [3,4]. To date, several studies have investigated implementing cryptographic algorithms using JavaScript. Since they are based on low-performance JavaScript language, they do not provide sufficient performance. Furthermore, previous methods implemented a limited number of algorithms rather than forming a complete crypto library. To build secure communication between servers and clients in web applications, a crypto library that provides confidentiality, data authentication, and key establishment functions is required.
Thus, we propose an efficient Web Assembly-based crypto library for secure communication in various web applications. The proposed crypto library comprises of a block cipher, a message authentication code, and a key exchange algorithm. We selected the revised CHAM [5], Hash Message Authentication (HMAC) [6], and Elliptic-curve Diffie-Hellman (ECDH) using the National Institute of Standards and Technology (NIST) [7] recommended P-256 curve [8], as a block cipher, message authentication code, and key agreement method. The proposed Web Assembly-based crypto library provides much improved performance compared to JavaScript-based implementations. We apply several optimization techniques to further improve the performance of cryptographic operations in the proposed library. We apply various methods to implement a safe and fast CHAM algorithm. The original CHAM family algorithm is vulnerable to differential attacks, so the revised CHAM algorithm is used. The revised CHAM algorithm [5] is an algorithm configured to be safe from differential attacks by increasing the number of rounds from 80 to 88, 80 to 112, and 96 to 120 for CHAM-64/128, CHAM-128/128, and CHAM-128/256 respectively. In the revised CHAM algorithm, there is a process of changing the place of the word constituting the input value every round. We apply existing 4-round combining method [9] to improve the performance of revised CHAM. The 4-round combining method works faster by eliminating the unnecessary process of changing places by using the word values used in each round flexibly. We propose an additional pre-computation method for a faster operation in CHAM-64/128. The pre-calculation method is applied to the internal functions ROL1, ROL8, and Keyschedule functions of CHAM-64/128 [10]. ROL1 and ROL8 are functions that rotate one word used as an input value by 1-bit and 8-bit, respectively, and Keyschedule is a function that creates a round key. The three functions use 16-bit input values, and we apply the method of pre-computation ROL1, ROL8, and Keyschedule from 0 × 0 ∼ 0  × f f f f . To a secure and efficient implementation of the ECDH key agreement method, we implement scalar multiplication, which is a core operation in the ECDH with the simple power analysis (SPA)-resistant and w N A F [11] method. The naive implementation of scalar multiplication is vulnerable to side-channel attacks (SCA) (e.g., simple power analysis (SPA) and timing analysis (TA)) [12,13]. In the case of scalar multiplication, if 1-bit of a scalar integer is 1, E C A D D and E C D B L are performed, and when 0, E C D B L is performed, so the process is different. Therefore, it is divided into 1-bit units during analysis and eventually scalar integer values, which are important information, can be attacked. Since scalar multiplication is computationally intensive, a windowing method is used to compute it. Even though the w N A F method is a representative windowing method for computing scalar multiplication efficiently, it is vulnerable to SPA and TA. As an efficient countermeasure against SPA and TA, the concept of the atomic block was presented previously [14]. An atomic block consists of *, +, −, + processes as one block. A fake operation, which is an unnecessary operation, is added to the E C D B L and E C A D D operation process of scalar multiplication, and the structure is made so that it is calculated in the order of *, +, −, +. Thus, it is safe for SPA and TA because 1-bit of a scalar integer value is calculated in the order of *, +, −, + regardless of 1 or 0. We improved the atomic block assuming that scalar multiplication is performed using only the basis point P = ( X , Y , Z ) . Change is completed from the existing atomic blocks *, +, −, + to *, +, −. Thus, 10 and 16 addition operations were reduced in E C D B L and E C A D D , respectively, compared to the existing atomic block. We apply w N A F and improved atomic block to the ECDH algorithm of the proposed crypto library.
Web Assembly and JavaScript are implemented algorithms, executed in Web browsers such as Chrome, Firefox, and Microsoft Edge respectively to measure performance. Following performance improvements that have been achieved in the order of Chrome, Firefox, and Microsoft Edge. In case of block cipher, 2.1, 2.1, and 2 times for CHAM-64/128, 3, 1.6, and 1.8 times for CHAM-128/128, and 3, 2.1, and 2.8 times for CHAM-128/256 shows performance improvement. CHAM-64/128 with applied pre-computation method shows a performance improvement of 1.2 times than not applied to the algorithm in three web browsers. For the key exchange algorithm, w N A F was applied to P-256. The atomic block method, which is an algorithm corresponding against SPA and TA, was also applied. When applying the existing atomic block and the proposed atomic block to w N A F , we check how much performance overhead appears than the original w N A F due to the increased number of operations, and how much the proposed atomic block is improved over the existing atomic block. For this purpose, each algorithm implemented in Web Assembly and JavaScript was measured in Chrome, Firefox, and Microsoft Edge. As a result, Web Assembly improved more than JavaScript, for the original w N A F by respectively 11, 12, and 11 times, the existing atomic block w N A F by respectively 10, 10, and 14 times, and the proposed w N A F by respectively 11, 12, and 14 times. Existing atomic block w N A F shows a performance overhead of 55%, 23%, and 37% compared to the original w N A F . However, the atomic block wNAF proposed to use only P = ( X , Y , Z ) , showing performance overheads of 18%, 6%, and 11%. The message authentication code is HMAC that uses SHA-256 to create a MAC. As a result of measurement, Web Assembly showed a higher performance over JavaScript by 7.5, 10.8, and 11 times for SHA-256, and 7.5, 24.8, and 7.1 times for HMAC.

Contribution

In this section, we propose the contributions of this paper.
  • First implementation of a crypto library using Web Assembly
    Recently, web-based applications with various functions are being made in the cross-platform language JavaScript. Web-based applications require confidentiality, integrity, and key exchange algorithms to send and receive data. Cryptographic algorithms are made in JavaScript for use in web-based applications. However, JavaScript is a heavy language and the nature of JavaScript operations has disadvantages in implementing cryptographic algorithms that require many mathematical operations. Therefore, Web Assembly was created due to the need for a performance similar to that of low-level languages in the web environment. In this paper, we propose to build a crypto library with cryptographic algorithms implemented using Web Assembly to implement data security and faster cryptographic algorithms in web-based applications. The proposed crypto library includes the block cipher CHAM family, the message authentication code HMAC, and the key exchange algorithm ECDH. For each cryptographic algorithm, the code implemented by Web Assembly shows a better performance than JavaScript. Our implementations are measured in currently popular Web browsers such as Chrome, Firefox, and Microsoft Edge. As a result of the measurement, on average, the CHAM family improved in speed by about 2.2 times, HMAC by about 7.1 times, and ECDH scalar multiplication improved by 12.3 times.
  • Optimized implementations of a crypto library on Web Assembly
    Since web-based applications exchange data with various environments, encryption is an essential function to send data confidentially. However, due to the advancement of technology and various environments and communication, the amount of data exchanged has also increased. Since the data to be communicated is encrypted in order, it is necessary to optimize for the environment in which the algorithm is used in order to encrypt quickly. The block cipher, a component of our proposed cipher library, is chosen as belonging to the CHAM family. However, the original CHAM algorithm is vulnerable to differential attacks. Therefore, CHAM-64/128, CHAM-128/128, and CHAM-128/256 use the revised CHAM algorithm which increases the number of rounds from 80 to 88, 80 to 112, and 96 to 120, respectively. In the revised CHAM algorithm, there is a process of changing the place of the word constituting the input value for each round. For a faster encryption operation, we apply a 4-round combining method, which is an existing method, to eliminate the process of changing the word position to perform a flexible operation. Additionally, we propose a pre-computation method for faster operation in CHAM-64/128. The method we propose applies to the internal functions ROL1, ROL8, and Keyschedule functions of CHAM-64/128. ROL1 and ROL8 are operations that shift the input value by 1, 8-bit Rotation Left Shift, and KeySchedule is a round key generation function. The input values of the three functions are 16-bit, which is a method of storing and using the result values from 0 × 0 to 0  × f f f f after pre-calculation. Thus, in the encryption process, the previously calculated values are simply taken and used. As a result, the performance was improved about 1.2 times compared to when the pre-computation method was not applied in Chrome, Firefox, and Microsoft Edge.
  • Providing improved method that resists side channel attacks
    Until now, there have not been many studies of side-channel analysis on the web environment. In particular, a secure key exchange protocol should be applied to provide a secure communication protocol in a web environment. ECDH is used as the key exchange algorithm. There is scalar multiplication, which is the main operation of ECDH. However, since the scalar multiplication process performs the E C D B L , E C A D D operation when the value of 1-bit of the scalar integer is 1, and the E C D B L operation when it is 0, it is possible to attack the scalar value because each bit is classified during an attack. We propose a secure key exchange protocol that is applied by improving the previously studied atomic block to cope with TA and SPA, which are vulnerable to side channel analysis attacks in the web environment. Existing atomic blocks consist of *, +, −, and + in one block. Fake operations are added to the main operations of scalar multiplication, E C D B L and E C A D D , and are configured to operate in the order of *, +, −, +. Therefore, it becomes difficult to distinguish because 1-bit values are calculated in the order of *, +, −, + regardless of 1 or 0. We change the existing atomic block to *, +,− and make it into one block. Thus, we reduced 10 and 16 addition operations in E C D B L and E C A D D , respectively. The method we are suggesting is a method used only with P = ( X , Y , Z ) . In addition, we calculate by applying w N A F and a proposed atomic block to P-256 for efficient scalar multiplication. The implemented algorithms measured the results in web browsers Chrome, Firefox, and Microsoft Edge. As a result of the measurement, compared to the original w N A F , w N A F applied with an existing atomic block shows a performance overhead of about 33%, and  w N A F with the proposed atomic block shows a performance overhead of about 11%. As a result, the proposed atomic block, compared to the existing method, reduced the performance overhead by 1 3 .
The remainder of this paper is organized as follows. Section 2 provides a basic overview of the web environment, Web Assembly’s description and conversion process, and the need for a crypto library. Section 3 describes the architecture of the proposed crypto library and target cryptographic algorithm. Section 4 describes related work. Section 5 describes the construction of a crypto library using the proposed cryptographic algorithm. Section 6 describes the performance measurement results. Finally, Section 7 concludes the paper.

2. Background

2.1. Overview of Web Environment

Users frequently make use of web applications and access web services for a long time. There is a variety of web browsers, e.g., Chrome, Firefox and Microsoft Edge, to access the web. Web browsers are created using HTML, CSS, and JavaScript. A web browser uses a rendering engine that works on the content and data of a web page and a JavaScript engine to execute JavaScript code to drive the web browser. Each web browser uses a different rendering engine and a JavaScript engine. For example, Chrome uses Blink as the rendering engine and V8 as the JavaScript engine. Microsoft Edge uses EdgeHTML and Chakra, Firefox uses Gecko and Rhino. Web-based applications view the same content on all devices, e.g., PCs and smartphones. Unlike native applications, web-based applications do not communicate directly with the operating system but run within the browser. Web-based applications can always keep up to date without downloading or upgrading, and operating systems do not require a separate platform, so a standard web language is made. Thus, users can easily access their choice of web using mobile devices, e.g., smartphones. The code in one web page does not affect the code in other pages. No matter which function is executed by the JavaScript code on a web page, other web pages are irrelevant to the result obtained from the previous web page. Due to the development of the web environment and the need for various functions, various libraries are created continuously to enable various functions in the web environment using JavaScript. In addition, web developers can use these JavaScript libraries easily and such libraries can be further modified. This is why JavaScript libraries and the web are constantly evolving.
The web page executes the HTML, CSS, and JavaScript code that makes up the web page, as shown in Figure 1, the rendering engine reads the code, parses the code, and then creates a Document Object Model (DOM) and CSS Object Model (CSSOM) tree. These trees create a render tree, which renders the web page to a web browser. The JavaScript engine handles the operation and program codes. The rendering engine stops working when it encounters JavaScript code. The JavaScript engine reads JavaScript code and creates a tree by parsing. After processing all of the JavaScript code, the rendering engine performs its own tasks again from the process where it stopped and processes the process.

2.1.1. Overview of Web Assembly

JavaScript is primarily used in web-based applications however, the operation speed of JavaScript is significantly slower than that of other native languages. Web-based applications cannot use native languages, e.g., C/C++. With the various content available on the web, the computation of content has become complicated or heavy, and implementing such operations in JavaScript is a disadvantage from a performance perspective. A language is required for the web that can be implemented and operate at a similar level of performance as a native language. Initially, Mozilla announced asm.js however, it has not received much attention due to its performance inefficiency. In addition, asm.js is difficulty to use. The need for native language-level performance in web environments continued, and Web Assembly was created based on asm.js. Web Assembly is in constant development and web browser companies, e.g., Google, Microsoft, and Mozilla, are involved in its development. Web Assembly is not intended to replace JavaScript, but is designed to operate web-based applications efficiently with JavaScript. Web Assembly implements code using languages that can identify existing variable types, e.g., C/C++, Rust, Typescript, Assemblyscript, and Go, and then converts them to Web Assembly using Emscripten.
Figure 2 shows the process of converting C/C++ code to Web Assembly code. After writing an algorithm in C/C++, Emscripten enters the C/C++ code into the Clang + LLVM and receives the compilation results to generate the Web Assembly extension (i.e., a WASM file). The WASM file is not immediately accessible to the DOM thus, Emscripten can help print the results of the wasm execution in HTML documents through JavaScript glue code to access the DOM.

2.1.2. Necessity of Crypto Library for Secure Web Application

Web-based applications can easily be accessed by users through various devices, e.g., PCs and smartphones therefore, various users, e.g., companies, institutions, and individuals, are using web-based applications. Many users use web-based applications for information provision, collection, search, or personal work. Web-based applications must show the same data on different platforms thus, web-based applications are created using JavaScript (a cross-platform language). Therefore, users obtain the same information on different platforms. JavaScript is also used in server-side network programming, databases, and the IoT. Due to convenience and various features, web-based applications communicate with various other environments and platforms. This is why web-based application send and receive various data and store them on a server. To ensure the continuous development of web-based applications and data security, a crypto library comprising of cryptographic and authentication algorithms is required. For security, encryption is performed when data are stored on a server and decryption is needed when data are used. In addition, authentication is required to determine whether data transmitted and received during communication are intact. Therefore, to securely communicate with other environments in web-based applications, ensuring confidentiality and integrity is essential.

3. Secure Crypto Library Design

3.1. Design Motivation and Library Architecture

Crypto libraries created using JavaScript make it easy for users of other web environments to obtain and use cipher algorithms, e.g., block ciphers, key agreement, key exchange algorithms, and message authentication. Web-based application developers that use JavaScript enable users to safely use applications by using a crypto library to protect user information, encrypt, and safely store data created by the web-based application, and verify data integrity through message authentication. Even if a 1-bit error occurs, users cannot obtain the correct data thus, when implementing an encryption algorithm, it must be implemented carefully in the operation process.
In the case of JavaScript, data types are not divided into char, short, and int according to bit size like C/C++, and there are no dividing negative and positive numbers, e.g., unsigned and signed. With C/C++, the bit size of the value that can be stored for each data type is determined thus, parts that exceed the bit size are cut automatically when calculating integers, which is useful for parts that require subtraction after computation, e.g., modular addition, in the computation of cryptographic algorithms. It can express negative and positive numbers as unsigned and signed and there are many useful parts in the finite field operation of cryptographic algorithms. However, JavaScript is not divided into data type, unsigned, and signed, so each cryptographic algorithm has different word sizes, and additional operations must be used to obtain the desired result. JavaScript is a heavy language, and it is slower because it requires additional operations when performing the same operations as C/C++. Thus, JavaScript is less efficient when implementing cryptographic algorithms.
Converting existing programming languages, such as C/C++, Rust, etc., to Web Assembly is used via Emscripten to allow them to operate in a web environment. Data types can be divided and operated for each size, and positive and negative numbers can be distinguished, such as unsigned and signed, so that a user can get the desired value without additional operations, unlike JavaScript. For cryptographic algorithms with many mathematical operations, Web Assembly can be implemented and operated faster and more efficiently in a web environment. If users use a Web Assembly-based crypto library when communicating with the web environment and other environments, the web-based application can perform faster computations and encryptions than when using a JavaScript-based crypto library.

3.2. Target Block Ciphers

Revised CHAM

In ICISC 2017, National Security Research Institute Koo et al. proposed the lightweight CHAM crypto family [10], which is divided into CHAM-64/128, CHAM-128/128, and CHAM-128/256 depending on the parameters. Table 1 shows the CHAM parameters. It also features a stateless on-the-fly key schedule, which reduces key storage space and provides lightweight cryptography with the ARX structure, which is suitable for limited environments. The key scheduling process in CHAM is shown in Figure 3. The  R O L 1 , R O L 11 , R O L 8 , and  X O R operations generate n / k × 2 round keys. Then, it encrypts all rounds with an n / k × 2 round key. The encryption process comprises of odd and even rounds, and each round function comprises of R O L 1 , R O L 8 , and  X O R operations, as well as modular addition. After each round, a cyclic left shift is performed in the word unit. The odd and even round encryption process of CHAM is shown in Figure 4.
In ICISC 2019 [5], it was suggested that the original CHAM was vulnerable to differential attacks by discovering the differential characteristics in the reduced round. CHAM-64/128, CHAM-128/128, and CHAM-128/256 found some differential characteristics in rounds 56, 72, and 78, respectively. Thus, for the revised CHAM the numbers of rounds are increased to defend against differential attacks. The revised CHAM-64/128 increases the number of rounds from 80 to 88, the revised CHAM-128/128 increases the number of rounds from 80 to 112, and the revised CHAM-128/256 increases the number of rounds from 96 to 120 rounds. Despite increasing the number of rounds, the revised CHAM showed efficient performance in both software and hardware, and was faster and safer against differential attacks than the lightweight SIMON and SPECK.

3.3. Target Message Authentication Code (MAC) Algorithm

3.3.1. Overview of HMAC

Web-based applications send and receive a lot of data in real time. To establish a secure communication environment, it is necessary to authenticate whether a message has been tampered with due to an intermediate attack, or whether the data have been transmitted from the correct user. MAC is used to confirm this and provides message integrity and authentication by generating a MAC by inputting a key shared with each other between the message sender and receiver. Various MAC, e.g., GCM, CCM, and HMAC have been proposed to provide message integrity and authentication. HMAC is classified into HMAC-SHA-224, HMAC-SHA-256, HMAC-SHA-384, and HMAC-SHA-512 according to the SHA-2 family used in the message compression process [6]. We use HMAC-SHA-256 as the target message authentication code by using SHA-256, which is the most frequently used in the message compression process. The overall process of HMAC-SHA-256 is shown in Figure 5. The MAC value is generated through two SHA-256 processes. IPAD and O P A D repeat 0x36 for I P A D and 0x5c for O P A D as much as the block length of the hash function. First, if the key length is greater than 512-bit, the key value is hashed. The remaining space is padded with zeros to adjust the length of the key to 512-bit. If the length of the key is less than 512 bits, the remaining space is padded with zeros to adjust the length of the key to 512-bit. Then, the input of the hash function is set by applying the X O R operation to each 512-bit IPAD and O P A D and then, the message value for authentication is added after the I P A D and padded X O R result value to form a single message, and a 256-bit hash value is generated through the SHA-256 process. Finally, the generated hash value is pasted after the O P A D and padded K-value XOR result to form a single message, and then set as the input data for SHA-256. Finally, the  generated hash value becomes the MAC value for message authentication.

3.3.2. Overview of SHA-256

SHA-256 Internal functions: SHA-256 use six logical functions, where each function operates on 32-bit words, which are represented as x, y, and z. The result of each function is a new 32-bit word. The six logical functions are expressed as follows [15].
Definition 1.
SHA-256 logical function:
C h ( x , y , z ) = ( x y ) ( ¬ x z )
M a j ( x , y , z ) = ( x y ) ( x z ) ( y z )
0 256 ( x ) = R O T R 2 ( x ) R O T R 13 ( x ) R O T R 22 ( x )
1 256 ( x ) = R O T R 6 ( x ) R O T R 11 ( x ) R O T R 25 ( x )
σ 0 256 ( x ) = R O T R 7 ( x ) R O T R 18 ( x ) ( x 3 )
σ 1 256 ( x ) = R O T R 17 ( x ) R O T R 19 ( x ) ( x 10 )
SHA-256 Padding the Message: The SHA-256 block has a 512-bit size, and the block operation is performed in 32-bit units. The SHA-256 function stores the length of the input data in the last block 64-bit. Therefore, the padding process must be included in the SHA-2 family for storing the message length, the padding process is summarized as follows.
-
Padding process
Step 0 
Let l is the length of the message;
Step 1 
Append the bit “1” to the end of the message;
Step 2 
Followed by k zero bits, where k is the smallest, non-negative solution to the equation l + 1 + k = 448 m o d 512;
Step 3 
Then append the 64-bit that is equal to the message length l expressed using a binary representation.
Padding can be inserted before hash computation begins on a message or any other time during the hash computation prior to processing the block(s) that will contain the padding [15].
SHA-256 Message Compression: The block operation in SHA-256 repeats the same process for 64 rounds. In the block operation, each round uses padded message data. Thus, SHA-256 must expand the data using message padding, this process is the message expansion process. Algorithm 1 shows the pseudocode of the SHA-256 message expansion process.
Algorithm 1 SHA-256 Message expansion process
Require: 
32-bit word Message M = ( M [ 0 ] , M [ 1 ] , , M [ 15 ] )
Ensure: 
Expansion Data W = ( W [ 0 ] , W [ 1 ] , , W [ 63 ] )
  1:
for i = 0 to 15 do
  2:
W [ i ] = M [ i ] ;
  3:
end for
  4:
for i = 16 to 63 do
  5:
W[i] = σ 1 256 ( W [ i 2 ] ) W [ i 7 ] σ 0 256 ( W [ i 15 ] ) W [ i 16 ]
  6:
end for
  7:
return W
In Algorithm 1, blocks the size of 512 bits are labeled M. An M block is divided into 16 32-bit words, each divided data are labeled M [ i ] , and the output of the message expansion process is labeled W ( 0 < i < 16 ) . Message compression updates the digest value through the extended W and eight initialized 32-bit working variables. The eight working values are a, b, c, d, e, f, g, and h, respectively. Algorithm 2 shows the pseudocode for the SHA-256 message compression process. In Algorithm 2, K t 256 is the round constant defined in the literature [15]. Then, Algorithm 2 is executed, the digest is updated using the eight working values. In SHA-256, the digest comprises eight 32-bit words. When the SHA-256 algorithm is called, the digest is initialized to a defined value [15]. After the message compression process, the digest is updated with the eight working values. The digest updates the 32-bit word and working value with 2 32 modular addition (⊞). When message compression uses the last padding block, the SHA-256 digest is updated through a working value. Finally, SHA-256 returns a 256-bit digest.
Algorithm 2 SHA-256 Message Compression
Require: 
Expansion Data W = ( W [ 0 ] , , W [ 63 ] )
Require: 
Working variables ( a , b , c , d , e , f , g , h ) in hash state
Ensure: 
Updated working variables ( a , b , c , d , e , f , g , h ) in hash state
  1:
for t = 0 to 63 do
  2:
T 1 = h + σ 1 256 ( e ) C h ( e , f , g ) K t 256 W [ t ]
  3:
T 2 = σ 0 256 ( a ) M a j ( a , b , c )
  4:
h = g , g = f , f = e , e = d T 1 , d = c , c = b , b = a , a = T 1 T 2
  5:
end for
  6:
return Hash value ( a , b , c , d , e , f , g , h )

3.4. Target Key Agreement Algorithm

ECDH with P-256 Curve

P-256 is a NIST curve amongst the 15 elliptic curves recommended by NIST [8]. It is an elliptic curve defined over a 256-bit prime field that offers approximately 128-bit security. This elliptic curve is defined by the following equation:
y 2 = x 3 3 x + b
where b is a constant in a finite field F p . The prime p is a 256-bit prime selected for easy modular reduction. This elliptic curve has an Abelian group structure with identity element O called the point of infinity. Scalar multiplication calculates k P using the 256-bit scalar value integer k and base point P = ( X 1 , Y 1 ) to obtain Q = ( X 3 , Y 3 ) values. Here, the algorithms used for scalar multiplication are E C A D D and E C D B L . The input value points used for E C D B L and E C A D D are affine coordinate systems P = ( X 1 , Y 1 ) , Q = ( X 2 , Y 2 ) . E C D B L calculates P + Q = 2 P when P = Q and E C A D D performs P + Q when PQ. The security of ECC is based on the difficulty of computing the elliptic curve discrete logarithm problem (ECDLP), i.e., it is very difficult to find scalar value k when Q and k are given by Q = k P .
The prime curve’s equation is y 2 = x 3 + a x + b . The prime curve is divided into P-256, P-384, and P-521 for each parameter. Here, scalar multiplication is performed using the affine coordinate system. E C A D D is performed whenever the 1-bit value of scalar k, i.e., the input value of scalar multiplication, is 1. E C A D D includes inverse circle arithmetic. Among the finite field operations (addition, subtraction, multiplication, and inverse), inverse operations are the heaviest. Therefore, rather than performing inverse calculation through E C A D D whenever the 1-bit value is 1 by extending to the projective coordinate system, the load on the inverse calculation is reduced by performing the inverse calculation once after the scalar multiplication operation. This method calculates scalar multiplication quickly using a more optimized method than projective coordinate by implementing scalar multiplication with a Jacobian coordinate system fixed at a = 3 . After converting the affine coordinate system to the Jacobian coordinate system, the  E C D B L and E C A D D operations are performed as shown in Table 2. After the scalar multiplication operation is completed, the value of k P can be obtained by converting the Jacobian coordinate system to the affine coordinate system [8].
ECDH is a Diffie–Hellman key exchange protocol that uses elliptic curve-based operations [7]. Elliptic curve cryptography is a public key method based on an elliptic curve and security in the discrete logarithm problem. In addition, as an alternative to RSA, it provides security with a much shorter key length than RSA. The elliptic curve-based operation comprises E C A D D and E C D B L . E C A D D is an operation that adds two points, and  E C D B L is an operation that doubles a point. The  d P for scalar d, i.e., a point on the elliptic curve at the point at base point P, is calculated as scalar multiplication using two elliptic curve operations.
The Diffie–Hellman key exchange is security with the difficulty of the discrete logarithm problem. Here, Alice and Bob calculate g a m o d p and g b m o d p with the private keys a and b, respectively, in the cyclic group < g > with order p. Then, after sending g a m o d p and g b m o d p to Bob and Alice respectively, by exponentially multiplying each private key to the transmitted value, private keys as g a b m o d p can be exchanged safely without revealing key information to an attacker. In DH, the key lengths of a and b are long, which is a disadvantage however, ECDH, which combines elliptic curve cryptography and DH, provides efficient security with a short key length using elliptic curve cryptography. The entire process of ECDH is shown in Figure 6. Here, Alice and Bob generate private keys a and b, respectively, and, after generating private keys, Alice and Bob set the base point G on the elliptic curve to calculate public keys a G and b G , respectively, and send the public keys a G and b G to each other. Finally, Alice and Bob calculate point a b G on the elliptic curve through scalar multiplication of their private key values on the transmitted public key.

3.5. Providing Side Channel Resistance

Atomic Block-Based ECDH Implementation

The scalar multiplication operation of elliptic curve cryptography is vulnerable to simple power analysis (SPA). This is because the scalar multiplication operation operates E C D B L and E C A D D when the 1-bit of the scalar integer is 1, and calculates only E C D B L when the 1-bit of the scalar integer is 0, resulting in different power consumption. In addition, E C A D D is only performed when the scalar multiplication operation is 1 thus, the use of branch statements is vulnerable to timing attacks. Countermeasures for side-channel analysis against scalar multiplication of elliptic curve cryptography have been proposed [12,13,14].
In [14], an atomic block, an algorithm for countering SPA, which is a side-channel attack method of RSA and elliptic curve cryptography, was proposed. In the Scalar multiplication operation, an atomic block is applied to E C D B L and E C A D D to be safe against SPA, which is a side-channel attack, and the existing atomic block operation repeats in the order of multiplication, addition, subtraction, and addition operations to perform a Scalar multiplication operation. E C A D D and E C D B L to which the atomic block is applied are shown in Table 3. In order to safely perform E C D B L and E C A D D through an atomic block, a fake operation must be added. For the existing atomic block, 17 fake operations were added for E C D B L and 32 for E C A D D . If  E C D B L and E C A D D are configured through the calculation process shown in Table 3, the same power waveform is repeated when an attacker measures the power consumption for scalar multiplication, so it is safe for SPA. In addition, it is safe for TA because branch statements are not required when implementing atomic blocks. When exchanging keys between web environment and another environment, using a branch statement in scalar multiplication inside ECDH is vulnerable to TA. Therefore, we present a secure key exchange protocol to users when using crypto libraries by applying an atomic block which is a security method for TA and SPA to scalar multiplication.

4. Previous Crypto Implementations in Web Environment

4.1. CHAM Algorithm in JavaScript and Web Assembly

The original CHAM algorithm (Figure 4) is divided into odd and even rounds, and swaps the position of the word at the end of each round. Use  2 × k / w round keys repeatedly. The words of the original CHAM algorithm return to their original positions every four rounds. Thus, as shown in Figure 7, it is possible to maintain the position of each word by calculating the necessary values for each round without performing a swap. This method is faster because the swap process in the original CHAM algorithm is not used [9]. The CHAM and AES algorithms are implemented with Web Assembly and demonstrate faster performance than JavaScript implementation [16].

4.2. Crypto Implementations on Web Assembly Environment

In [17], H A C L * [18], libsodium [19], and the proposed W H A C L * [17] libraries are converted to Web Assembly to compare performance. H A C L * is a verified library of cryptographic primitives that is implemented in L o w * and compiled to C via KreMLin [20]. Libsodium is a modern, easy-to-use software library for encryption, decryption, signatures, password hashing, and more. W H A C L * is the library proposed in [17]. In Table 4, (A) is a H A C L * library compiled with C using KreMLin and then compiled as Web Assembly through Emscripten, (B) is libsodium compiled with Web Assembly through Emscripten, and (C) is W H A C L * compiled with KreMLin. Looking at Table 4, H A C L * is slower than libsodium in Curve25519 and Ed25519. H A C L * depends on 128-bit arithmetic in C compilers such as gcc and clang. Libsodium converts to 32-bit implementation and operates. Web Assembly also encodes 128-bit integers into 64-bit integer pairs. Due to these characteristics, there is a difference in performance when converting H A C L * and libsodium libraries to Web Assembly. As a result, when using a cryptographic algorithm by converting the code implemented in a web-based application into a Web Assembly, implementing a cryptographic algorithm in consideration of the characteristics of such Web Assembly helps to improve performance.
In [21], the official implementation of Picnic [22], which was NIST’s second round candidate for the standardization of quantum tolerance encryption, was converted into Web Assembly, and its performance was measured in Chrome, Firefox, and Microsoft Edge. Comparing Table 5 and Table 6, as a result, Web Assembly shows a result that is about 2 ∼ 3 times slower than that of C.

5. Proposed Web Assembly-Based Crypto Library Implementation

5.1. Proposed Implementation of Revised CHAM

The revised CHAM algorithm is an ARX-based lightweight cipher, and is an algorithm that is safer for differential attacks than the original CHAM. The revised CHAM algorithm is safe for differential attacks because it increases the number of rounds of the original CHAM algorithms, CHAM-64/128, CHAM-128/128, and CHAM-128/256. With this method, we implement the revised CHAM algorithm to be safe for differential attacks by implementing it using Web Assembly. The number of words in the plaintext entering the input value from the original CHAM algorithm is 4. The original CHAM algorithm swaps the place of four words that make up the plaintext at the end of one round.
Rather than swapping four words for each round [9], as shown in Figure 7, it uses a feature that returns to the original words every four rounds to improve performance. At the end of each round, the round algorithm is calculated using the necessary values while maintaining the position of each word without swapping by removing the word swapping process from the existing algorithm to induce a faster round operation. In CHAM algorithms, the plaintext and 1-word of the key are 16-bit in CHAM-64/128. In Figure 3 and Figure 4, 16-bit word is used as the input to R O L 8 , R O L 1 , and Keyschedule. Algorithms 3 and 4 present a method to pre-compute the input values of R O L 8 , R O L 1 , and Keyschedule, through which the resultant values are 16-bit and are calculated in advance from 0 × 0 to 0 × f f f f , the number of all 16-bit inputs. Whenever the R O L 1 , R O L 8 , and Keyschedule functions were required, they used a method of taking and using the result values based on the input computed in the pre-built table rather than the operation.
Algorithm 3 Generation of Rotation Left Shift Table
 Output: 
ROL1-Table[0 × f f f f ], ROR8-Table[0 × f f f f ], ROR1-Table[0 × f f f f ]
1:
for i = 0 × 0 to 0 × f f f f do
2:
ROL1-Table[i]ROL1(i)
3:
ROL8-Table[i]ROL8(i)
4:
end for
Algorithm 4 Generation of Keyschedule Table
 Output: 
Key1-Table[0 × f f f f ], Key2-Table[0 × f f f f ]
1:
fori = 0 × 0 to 0 × f f f f do
2:
Key1-Table[i]iROL1(i) ⊕ ROL8(i)
3:
Key2-Table[ ( i + k / w ) 1 ]iROL8(i) ⊕ROL11(i)
4:
end for
First, Algorithm 3 is used to create the precomputation table for R O L 1 and R O L 8 . Then, the Keyschedule table is created using Algorithm 4. For the R O L 1 and R O L 8 operations in Algorithm 4, the Keyschedule table can be created faster by using the table created in Algorithm 3.

5.2. Proposed Implementation of ECDH with Side Channel Resistance

In the literature [11], the  N A F algorithm used a negative representation to reduce the number of 1s for scalar k. As the number of E C A D D decreases as much as the number of 1, scalar multiplication is possibly faster than before. The  w N A F algorithm processes E C A D D for w-bit at once thus, the  w N A F algorithm realizes a faster scalar multiplication than the binary left to right scalar multiplication algorithm. To process w-bit, pre-computation is required for odd values in the range [ 2 w , 2 w 1 1 ] . It can be used at variable points due to the relatively low cost of pre-computation.
To use w N A F , conversion from scalar k to N A F w ( k ) is required, which is realized in the same manner as Algorithm 5. The  N A F w ( k ) can be up to 1 bit longer than the existing k, and the maximum nonzero density will be 1 w + 1 . Multiplication for the overall scalar k is performed in the same manner as Algorithm 6. In the pre-calculation, one  E C D B L and 2 w 2 E C A D D operations are required, and, in the scalar multiplication process, l E C D B L and 1 w + 1 E C A D D operations are required. The  w N A F algorithm is safe for SPA because it uses the number of holes in the range [ 2 w , 2 w 1 1 ] . Depending on the bit size of Scalar k, the number of pre-computed E C A D D s varies. Therefore, it is vulnerable to TA, and it is implemented to be safe for TA using atomic blocks.
Algorithm 5 Computing the width- w N A F of a positive integer
 Input: Window width w, positive integer k.
 Output: N A F w ( k )
1:
i 0 .
2:
while k 1 do
3:
if k is odd then
4:
     k i k m o d s 2 w , k k k i
5:
else
6:
     k i 0
7:
end if
8:
k k / 2 , i i + 1
9:
end while
10:
return ( k i 1 , k i 2 , ⋯, k 1 , k 0 )
Algorithm 6 Window N A F method for point multiplication
 Input: Window width w, positive integer k, P E ( F q )
 Output: k P
1:
Use Algorithm 5 to compute N A F w ( k ) = i = 0 l 1 k i 2 i
2:
Compute P i = i P for i { 1 , 3 , 5 , , 2 w 1 1 }
3:
Q
4:
fori from l 1 downto 0 do
5:
Q 2 Q
6:
if k i 0 then
7:
  if k i > 0 then
8:
    Q Q + P k i
9:
  else
10:
    Q Q P k i
11:
   end if
12:
  end if
13:
end for
14:
return (Q)
Atomic block is safe for SCA by repeating the same process regardless of 0 or 1 in scalar multiplication operation. Atomic block provides safety for SCA by making it difficult to distinguish between E C D B L and E C A D D by adding fake operations to make calculations in a regular order. In this paper, we present the operation process of a new atomic block by reducing the fake operation in the previous atomic block [14].
As seen in Table 3, the existing atomic blocks in the literature [14] consist of *, +, −, and +. The method we propose is an improved method, assuming that only P = ( X , Y , Z ) is used. We propose a method to reduce the number of fake operations by changing the block configuration of the existing atomic block to the configuration of *, +, −. The proposed atomic block composes E C D B L and E C A D D into 10 and 16 blocks by removing one addition in one block process, respectively.
Therefore, 10 and 16 addition operations in E C D B L and E C A D D are reduced compared to the existing atomic block. As for the existing atomic block, E C D B L has nine fake additions and eight fake subtractions, and E C A D D has 22 fake additions and 10 fake subtractions. In the proposed atomic block, E C D B L has six fake subtractions, E C A D D has nine fake additions and nine fake subtractions. Finally, the proposed atomic block reduced nine fake additions and two fake subtractions in E C D B L and 13 fake additions and one fake subtraction in E C A D D compared to the existing atomic block. The proposed atomic block operation process is shown in Table 7.
Table 8 lists the number of additions, subtractions, and multiplications of the original w N A F , the existing atomic block, and the proposed atomic block.

5.3. Proposed Implementation of HMAC

When a web-based application communicates with other environments, it encrypts the data using various cryptographic algorithms, and then sends the encrypted data. For the sent encrypted data, it is necessary to determine whether it was sent without damage. Encrypted data can be confirmed whether it has been transmitted normally using HMAC, which is MAC made using SHA-256. Implementing HMAC as Web Assembly allows web-based applications to authenticate faster than JavaScript [6,15].

6. Performance Analysis

In the environment of Table 9, the proposed crypto library was implemented as Web Assembly and JavaScript, was compared in Web browsers Chrome, Firefox, and Microsoft Edge to evaluate the performance. Table 10, Table 11, Table 12, Table 13, Table 14, Table 15, Table 16, Table 17 and Table 18 show the results of the implementation of existing algorithms and the proposed methods, i.e., the revised CHAM algorithm, w N A F , SHA-256, and HMAC.
Table 10, Table 11 and Table 12 are the results of measuring the implemented CHAM family algorithm in Chrome, Firefox, and Microsoft Edge. The revised CHAM family algorithm has a 4-round combination method, and additionally, CHAM-64/128 is implemented with JavaScript and Web Assembly by applying a pre-computation method. As a result, the revised CHAM algorithm with the applied 4-round combining method showed an improved performance, in Chrome, Firefox, and MicrosoftEdge, by 2.1, 2.1, and 2 times for CHAM-64/128, 3, 1.6, and 1.8 times for CHAM-128/128, and 3, 2.1, and 2.8 times for CHAM-128/256. Pre-computation applied to CHAM-64/128 shows a 1.2 times performance improvement than existing revised CHAM-64/128 in three web browsers.
Table 13, Table 14 and Table 15 are the result tables measured in Chrome, Firefox, and Microsoft Edge after implementing the original w N A F , the existing atomic block w N A F , and the proposed atomic block w N A F with JavaScript and Web Assembly. As a result of measurement in Chrome, Firefox, and Microsoft Edge, Web Assembly improved more than JavaScript, for the original w N A F by 11, 12, and 11 times, the existing atomic block w N A F by 10, 10, and 14 times, and the proposed w N A F by 11, 12, and 14 times. As shown in Table 8, the atomic block increases the number of operations compared to the existing E C D B L and E C A D D , resulting in performance overhead. Therefore, in the case of the existing atomic block w N A F , performance overhead of 55, 23, and 37% occurs. However, in the case of the atomic block w N A F proposed in P = ( X , Y , Z ) , the number of operations is reduced, resulting in a performance overhead of 18%, 6%, and 11%, and scalar multiplication is possible faster than the conventional atomic block.
Table 16, Table 17 and Table 18 are the results of measuring HMAC, a MAC made using SHA-256 and SHA-256 implemented with JavaScript and Web Assembly in Chrome, Firefox, and Microsoft Edge. As a result, Web Assembly showed a higher performance by 7.5, 10.8, and 11 times for SHA-256, and 7.5, 24.8, and 7.1 times for HAMC, over JavaScript in Chrome, Firefox, and Microsoft Edge.

7. Conclusions

In this paper, we proposed a crypto library by implementing a cryptographic algorithm using Web Assembly to improve the performance of cryptographic algorithms in web-based applications. The block cipher, key exchange algorithm, and MAC algorithm were implemented directly in JavaScript and Web Assembly and were compared. As the block cipher, we employed a lightweight cipher (i.e., the CHAM algorithm), applied the four-round combining method, and applied revised CHAM algorithm method, which is secure against differential attacks. Algorithms implemented in Web Assembly and JavaScript were measured in Chrome, Firefox, and Microsoft Edge. In case of block cipher, 2.1, 2.1, and 2 times for CHAM-64/128, 3, 1.6, and 1.8 times for CHAM-128/128, and 3, 2.1, and 2.8 times for CHAM-128/256 showed improvement in performance. CHAM-64/128 to which the pre-computation method was applied showed a performance improvement of 1.2 times in three web browsers than when the algorithm was not applied. For the key exchange algorithm, w N A F was applied to P-256. The atomic block method, which is an algorithm corresponding against SPA and TA, was also applied. When applying the existing atomic block and proposed atomic block to w N A F , we checked how much the performance overhead appeared in comparison to the original w N A F due to the increased number of operations, and how much the proposed atomic block improved over the existing atomic block. For this purpose, each algorithm implemented in Web Assembly and JavaScript was measured in Chrome, Firefox, and Microsoft Edge. As a result, Web Assembly improved over JavaScript, for the original w N A F by 11, 12, and 11 times, the existing atomic block w N A F by 10, 10, 14 times, and the proposed w N A F by 11, 12, and 14 times. Existing atomic block w N A F shows a performance overhead of 55%, 23%, and 37% compared to the original w N A F . However, the atomic block w N A F was proposed to be used at P = ( X , Y , Z ) showing performance overheads of 18%, 6% and 11%. The message authentication code was HMAC, which uses SHA-256 to create a MAC. As a result of the measurement, Web Assembly showed higher performance over JavaScript by 7.5, 10.8, and 11 times for SHA-256, and 7.5, 24.8, and 7.1 times for HMAC.
Web Assembly will continue to evolve through several web browser companies. Web Assembly is intended to be used together, not as a replacement for JavaScript. Therefore, with the development of Web Assembly in future, the function call time between Web Assembly and JavaScript will gradually decrease. Thus, from a cryptographic algorithm perspective in future, Web Assembly will be an appropriate language to use. Cryptographic algorithms with a lot of mathematical operations use Web Assembly, and additionally, it will be more efficient from a Web-based application perspective if it is configured using a JavaScript library of various functions. Web Assembly works in a SISD way, and therefore, there is a disadvantage that Web Assembly is slower when processing the same amount of data than the cryptographic algorithm using SIMD which is currently being studied. However, Web Assembly is being developed to support the SIMD method, supporting quite a few intrinsic functions, and is continuously evolving. In addition, an API called WebGPU is being created that can use the functions of a graphic card in a web environment. WebGPU enables the SIMD operation using a graphic card in a web environment. In addition, WebGPU is evolving to support use with Web Assembly. Eventually, we will be able to encrypt and decrypt large amounts of data at high speed when we can use high-performance functions in the web environment such as Web Assembly and WebGPU in future. Currently, there are various attack methods for cryptographic algorithms, but our proposed crypto library only applied a differential attack for block ciphers and SPA and TA for key exchange. We plan to investigate possible attack methods for cryptographic algorithms in the web environment in future and study to improve response algorithms suitable for attack methods. In addition, we will study further because it will be possible to optimize cryptographic algorithms in web-based applications through the support of Web Assembly’s SIMD and WebGPU. As such, research on cryptographic libraries used in web-based applications through the development of Web Assembly and support for various functions in the future will be of valuable study.

Author Contributions

Writing—original draft, B.P. and J.S.; Writing—review and editing, S.C.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Research Foundation of Korea (NRF), grant funded by the Korea government (MSIT) (No. 2019R1F1A1058494).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Telecommunications Technology Association. Security Management Guidelines for Web Environment Establishment & Operation. 2006. Available online: http://www.tta.or.kr/data/ttas_view.jsp?rn=1&by=asc&order=publish_date&totalSu=16253&pk_num=TTAS.KO-10.0090/R1&nowSu=5594 (accessed on 2 November 2020).
  2. Zakas, N.C. Professional Javascript for Web Developers; John Wiley & Sons: Hoboken, NJ, USA, 2009. [Google Scholar]
  3. Rossberg, A.; Titzer, B.L.; Haas, A.; Schuff, D.L.; Gohman, D.; Wagner, L.; Zakai, A.; Bastien, J.F.; Holman, M. Bringing the web up to speed with WebAssembly. Commun. ACM 2018, 61, 107–115. [Google Scholar] [CrossRef]
  4. Rossberg, A. WebAssembly Specification Release 1.1. 2020. Available online: https://webassembly.github.io/spec/core/ (accessed on 30 October 2020).
  5. Roh, D.; Koo, B.; Jung, Y.; Jeong, I.; Lee, D.; Kwon, D.; Kim, W.H. Revised Version of Block Cipher CHAM. In Proceedings of the Information Security and Cryptology—ICISC 2019—22nd International Conference, Seoul, Korea, 4–6 December 2019; Revised Selected Papers. Springer: Berlin/Heidelberg, Germany, 2019; Volume 11975, pp. 1–19. [Google Scholar]
  6. Federal Information Processing Standards Publications 198-1(FIPS PUBS). In The Keyed-Hash Message Authentication Code (HMAC); Technical Report; National Institute of Standards and Technology (NIST): Gaithersburg, MD, USA, 2008.
  7. Barker, E.; Chen, L.; Roginsky, A.; Vassilev, A.; Davis, R. Recommendation for Pair-Wise Key-Establishment Schemes Using Discrete Logarithm Cryptography; Technical Report; National Institute of Standards and Technology(NIST): Gaithersburg, MD, USA, 2018.
  8. Federal Information Processing Standards Publications 186-4(FIPS PUBS). In Digital Signature Standard (DSS); Technical Report; National Institute of Standards and Technology (NIST): Gaithersburg, MD, USA, 2013.
  9. Park, C.; Park, T.; Seo, H.; Kim, H. Optimization of CHAM Encryption Algorithm Based on Javascript. In Proceedings of the Tenth International Conference on Ubiquitous and Future Networks, ICUFN 2018, Prague, Czech Republic, 3–6 July 2018; pp. 774–778. [Google Scholar]
  10. Koo, B.; Roh, D.; Kim, H.; Jung, Y.; Lee, D.; Kwon, D. CHAM: A Family of Lightweight Block Ciphers for Resource-Constrained Devices. In Proceedings of the Information Security and Cryptology—ICISC 2017—20th International Conference, Seoul, Korea, 29 November–1 December 2017; Revised Selected, Papers. Kim, H., Kim, D.C., Eds.; Springer: Berlin/Heidelberg, Germany, 2017; Volume 10779, pp. 3–25. [Google Scholar]
  11. Hankerson, D.; Menezes, A.J.; Vanstone, S. Guide to Elliptic Curve Cryptography; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2006. [Google Scholar]
  12. Möller, B. Securing Elliptic Curve Point Multiplication against Side-Channel Attacks. In Proceedings of the Information Security, 4th International Conference, ISC 2001, Malaga, Spain, 1–3 October 2001; Davida, G.I., Frankel, Y., Eds.; Springer: Berlin/Heidelberg, Germany, 2001; Volume 2200, pp. 324–334. [Google Scholar]
  13. Izu, T.; Móller, B.; Takagi, T. Improved Elliptic Curve Multiplication Methods Resistant against Side Channel Attacks. In Proceedings of the Progress in Cryptology—INDOCRYPT 2002, Third International Conference on Cryptology in India, Hyderabad, India, 16–18 December 2002; Menezes, A., Sarkar, P., Eds.; Springer: Berlin/Heidelberg, Germany, 2002; Volume 2551, pp. 296–313. [Google Scholar]
  14. Chevallier-Mames, B.; Ciet, M.; Joye, M. Low-Cost Solutions for Preventing Simple Side-Channel Analysis: Side-Channel Atomicity. IEEE Trans. Comput. 2004, 53, 760–768. [Google Scholar] [CrossRef]
  15. Federal Information Processing Standards Publications 180-4(FIPS PUBS). In Secure Hash Standard (SHS); Technical Report; National Institute of Standards and Technology (NIST): Gaithersburg, MD, USA, 2015.
  16. An, K.; Kwon, H.; Kim, H.; Seo, H. Implementation of Ultra-Light Block Cipher CHAM Optimization Using Web Assembly. J. Korea Inst. Inf. Secur. 2019. Available online: https://github.com/solowal/PUBLICATION/blob/master/2019/%EC%9B%B9%20%EC%96%B4%EC%85%88%EB%B8%94%EB%A6%AC%EB%A5%BC%20%ED%99%9C%EC%9A%A9%ED%95%9C%20%EC%B4%88%EA%B2%BD%EB%9F%89%20%EB%B8%94%EB%A1%9D%EC%95%94%ED%98%B8%20CHAM%20%EC%B5%9C%EC%A0%81%ED%99%94%20%EA%B5%AC%ED%98%84_%EB%85%BC%EB%AC%B8.pdf (accessed on 2 November 2020).
  17. Protzenko, J.; Beurdouche, B.; Merigoux, D.; Bhargavan, K. Formally Verified Cryptographic Web Applications in WebAssembly. In Proceedings of the 2019 IEEE Symposium on Security and Privacy, SP 2019, San Francisco, CA, USA, 19–23 May 2019; pp. 1256–1274. [Google Scholar]
  18. Zinzindohoué, J.K.; Bhargavan, K.; Protzenko, J.; Beurdouche, B. HACL*: A Verified Modern Cryptographic Library. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, CCS 2017, Dallas, TX, USA, 30 October–3 November 2017; pp. 1789–1806. [Google Scholar]
  19. Bernstein, D.J.; Denis, F. Libsodium-A Modern, Portable, Easy to Use Crypto Library. 2019. Available online: https://github.com/lemonsn/libsodium (accessed on 30 October 2020).
  20. Protzenko, J.; Zinzindohoué, J.K.; Rastogi, A.; Ramananandro, T.; Wang, P.; Béguelin, S.Z.; Delignat-Lavaud, A.; Hritcu, C.; Bhargavan, K.; Fournet, C.; et al. Verified low-level programming embedded in F. Proc. ACM Program. Lang. 2017, 1, 17:1–17:29. [Google Scholar] [CrossRef] [Green Version]
  21. Rösch, J. Efficient Implementation of Picnic. Available online: https://is.muni.cz/th/pbn05/ (accessed on 30 October 2020).
  22. Chase, M.; Derler, D.; Goldfeder, S.; Katz, J.; Kolesnikov, V.; Orlandi, C.; Ramacher, S.; Rechberger, C.; Slamanig, D.; Wang, X.; et al. The Picnic Signature Scheme Design Document. 2020. Available online: https://github.com/microsoft/Picnic/blob/master/spec/design-v2.2.pdf (accessed on 2 November 2020).
Figure 1. JavaScript working process.
Figure 1. JavaScript working process.
Electronics 09 01839 g001
Figure 2. WebAssembly conversion process.
Figure 2. WebAssembly conversion process.
Electronics 09 01839 g002
Figure 3. CHAM keyschedule.
Figure 3. CHAM keyschedule.
Electronics 09 01839 g003
Figure 4. CHAM round function.
Figure 4. CHAM round function.
Electronics 09 01839 g004
Figure 5. Hash Message Authentication Code (HMAC) process.
Figure 5. Hash Message Authentication Code (HMAC) process.
Electronics 09 01839 g005
Figure 6. ECDH Process.
Figure 6. ECDH Process.
Electronics 09 01839 g006
Figure 7. CHAM 4-round combining process.
Figure 7. CHAM 4-round combining process.
Electronics 09 01839 g007
Table 1. Parameters of CHAM family (n: Block size, k: Key size, r: Round number, and w: Word size).
Table 1. Parameters of CHAM family (n: Block size, k: Key size, r: Round number, and w: Word size).
Ciphernkrw k / w
CHAM-64/1286412880168
CHAM-128/12812812880324
CHAM-128/25612825696328
Table 2. Jacobian E C D B L , E C A D D .
Table 2. Jacobian E C D B L , E C A D D .
ECDBLECADD
P = ( X 1 , Y 1 , Z 1 )
P + Q = 2 P = ( X 3 , Y 3 , Z 3 )
P = ( X 1 , Y 1 , Z 1 )
Q = ( X 2 , Y 2 , Z 2 )
P + Q = ( X 3 , Y 3 , Z 3 )
M = 3 X 1 2 + a Z 1 4
S = 4 X 1 Y 1 2
T = 8 Y 1 4
U 1 = X 1 Z 2 2
U 2 = X 2 Z 1 2
S 1 = Y 1 Z 2 3
S 2 = Y 2 Z 1 3
H = U 2 U 1
R = S 2 S 1
X 3 = M 2 2 S
Y 3 = M ( S X 3 ) T
Z 3 = 2 Y 1 Z 1
X 3 = R 2 H 3 2 U 1 H 2
Y 3 = R ( U 1 H 2 X 3 ) S 1 H 3
Z 3 = H Z 1 Z 2
Table 3. Existing atomic block method.
Table 3. Existing atomic block method.
ECDBLECADD
T 0 a ,
T 1 X 1 , T 2 Y 1 , T 3 Z 1
T 1 X 1 , T 2 Y 1 , T 3 Z 1
T 7 X 2 , T 8 Y 2 , T 9 Z 2
1.
T 4 T 1 · T 1
T 5 T 4 + T 4

T 5 T 4 + T 4
2.
T 5 T 3 · T 3
T 1 T 1 + T 1


3.
T 5 T 5 · T 5



4.
T 5 T 0 · T 5
T 4 T 4 + T 5

T 5 T 2 + T 2
5.
T 3 T 3 · T 5

















6.
T 2 T 2 · T 2
T 2 T 2 + T 2


7.
T 5 T 3 · T 3

T 5 T 5

8.
T 5 T 5 · T 5
T 1 T 1 + T 5

T 1 T 1 + T 5
9.
T 2 T 2 · T 2
T 2 T 2 + T 2

T 5 T 1 + T 5
10.
T 4 T 4 · T 5
T 2 T 2 + T 4
T 2 T 2















1.
T 4 T 9 · T 9



2.
T 1 T 1 · T 4



3.
T 4 T 4 · T 9



4.
T 2 T 2 · T 4



5.
T 4 T 3 · T 3



6.
T 5 T 4 · T 7

T 5 T 5
T 5 T 1 + T 5
7.
T 4 T 4 · T 8

T 4 T 4
T 4 T 2 + T 4
8.
T 4 T 4 · T 8

T 4 T 4
T 4 T 2 + T 4
9.
T 3 T 3 · T 9



10.
T 3 T 3 · T 5



11.
T 6 T 5 · T 5



12.
T 1 T 1 · T 6

T 4 T 4

13.
T 5 T 5 · T 6
T 6 T 1 + T 2
T 2 T 2
T 6 T 2 + T 6
14.
T 1 T 4 · T 4
T 1 T 1 + T 5
T 6 T 6
T 1 T 1 + T 6
15.
T 2 T 2 · T 5
T 1 T 1 + T 6

T 6 T 1 + T 6
16.
T 4 T 4 · T 6
T 2 T 2 + T 4

Table 4. Performance evaluation of H A C L * . (A) is H A C L * /C, (B) is libsodium, and (C) is W H A C L * . (1k: 1000, B : Byte) [17].
Table 4. Performance evaluation of H A C L * . (A) is H A C L * /C, (B) is libsodium, and (C) is W H A C L * . (1k: 1000, B : Byte) [17].
Algorithm (Blocksize, #Rounds)(A)(B)(C)
Curve25519 (1 k)0.83 s0.15 s4.05 s
Chacha20 (4 kB, 100 k)1.86 s1.74 s6.62 s
Salsa21 (4 kB, 100 k)1.55 s2.24 s5.52 s
Ed25519 sign (16 kB, 1 k)3.01 s0.27 s15.6 s
Ed25519 verify (16 kB, 1 k)3.07 s0.24 s15.6 s
Poly1305_32 (16 kB, 10 k)0.27 s0.19 s-
Poly1305_64 (16 kB, 10 k)1.93 s0.19 s11.5 s
SHA2_256 (16 kB, 10 k)1.64 s1.84 s3.5 s
SHA2_512 (16 kB, 10 k)1.16 s1.21 s3.2 s
Table 5. Performance of picnic (C implementation) [22].
Table 5. Performance of picnic (C implementation) [22].
ParametersSignVerify
Picnic-L1-FS2.82 ms2.34 ms
Picnic-L1-UR3.49 ms2.87 ms
Picnic2-L1-FS106.91 ms42.64 ms
Picnic-L3-FS6.74 ms5.66 ms
Picnic-L3-UR8.64 ms7.12 ms
Picnic2-L3-FS328.68 ms99.27 ms
Picnic-L5-FS12.37 ms10.59 ms
Picnic-L5-UR15.02 ms12.64 ms
Picnic2-L5-FS708.82 ms178.63 ms
Table 6. Performance of picnic (Web Assembly implementation) [21].
Table 6. Performance of picnic (Web Assembly implementation) [21].
FirefoxEdgeChrome
ParametersSignVerifySignVerifySignVerify
Picnic-L1-FS6.67 ms4.97 ms8.22 ms6.56 ms6.62 ms6.86 ms
Picnic-L1-UR8.36 ms6.36 ms9.64 ms7.70 ms9.61 ms7.82 ms
Picnic-L3-FS15.57 ms12.98 ms18.54 ms15.78 ms18.38 ms15.56 ms
Picnic-L3-UR20.11 ms16.47 ms22.86 ms19.08 ms22.58 ms19.10 ms
Picnic-L5-FS27.25 ms23.01 ms32.93 ms29.45 ms32.62 ms28.34 ms
Picnic-L5-UR33.92 ms28.70 ms39.91 ms34.72 ms38.84 ms33.12 ms
Picnic-L1-full5.64 ms3.82 ms5.05 ms3.35 ms5.01 ms3.26 ms
Picnic-L3-full10.06 ms7.32 ms8.94 ms6.66 ms8.75 ms6.38 ms
Picnic-L5-full16.49 ms13.00 ms16.12 ms12.61 ms16.02 ms12.26 ms
Picnic3-L121.90 ms17.58 ms19.63 ms16.02 ms19.54 ms15.46 ms
Picnic3-L348.57 ms38.26 ms43.74 ms35.32 ms43.80 ms34.58 ms
Picnic3-L580.54 ms59.59 ms75.38 ms55.75 ms73.57 ms54.30 ms
Table 7. Proposed atomic block method.
Table 7. Proposed atomic block method.
ECDBLECADD
T 0 a ,
T 1 X 1 , T 2 Y 1 , T 3 Z 1
T 1 X 1 , T 2 Y 1 , T 3 Z 1
T 10 X 2 , T 11 Y 2 , T 12 Z 2
1.
T 4 T 1 · T 1
T 5 T 4 + T 4

2.
T 7 T 3 · T 3
T 4 T 5 + T 4

3.
T 5 T 2 · T 2
T 8 T 5 + T 5

4.
T 9 T 2 · T 3
T 3 T 9 + T 9

5.
T 7 T 7 · T 7
T 9 T 8 + T 8












6.
T 7 T 0 · T 7
T 4 T 4 + T 7

7.
T 5 T 3 · T 3
T 5 T 7 + T 7
T 5 T 5
8.
T 6 T 4 · T 4
T 1 T 6 + T 5
T 1 T 6
9.
T 8 T 8 · T 9
T 7 T 7 + T 6
T 8 T 8
10.
T 4 T 7 · T 4
T 2 T 4 + T 8












1.
T 4 T 12 · T 12


2.
T 1 T 4 · T 1

T 1 T 1
3.
T 5 T 3 · T 3


4.
T 7 T 5 · T 10
T 7 T 7 + T 10

5.
T 4 T 4 · T 12

T 1 T 1
6.
T 2 T 2 · T 4


7.
T 5 T 5 · T 3

T 2 T 2
8.
T 8 T 5 · T 11
T 8 T 8 + T 2
T 2 T 2
9.
T 5 T 7 · T 7


10.
T 6 T 5 · T 1
T 1 T 6 + T 6
T 1 T 1
11.
T 4 T 8 · T 8


12.
T 5 T 7 · T 5
T 1 T 4 + T 1
T 5 T 5
13.
T 3 T 3 · T 12
T 1 T 1 + T 5
T 1 T 4
14.
T 2 T 5 · T 2
T 6 T 6 + T 4

15.
T 8 T 8 · T 6
T 2 T 8 + T 2

16.
T 3 T 3 · T 7

Table 8. Operation count comparison (M: Field multiplication, A: Field addition, and S: Field subtraction).
Table 8. Operation count comparison (M: Field multiplication, A: Field addition, and S: Field subtraction).
ECDBLECADD
M A S M A S
wNAF8541616
Existing Atomic Block
wNAF
102010163216
Proposed Atomic Block
wNAF
101010161616
Table 9. Running environment.
Table 9. Running environment.
Operating SystemWindow 10 Education
CPUIntel i5-8250U 1.60GHz
RAM8.00GB
SW(1) Chrome 85.0.4183.83
(2) Firefox 79.0
(3) Microsoft Edge 84.0.522.63
Languages(1) JavaScript
(2) Web Assembly
w N A F Window width w4
Table 10. Revised CHAM algorithm performance in Chrome (CPB: Cycle Per Byte).
Table 10. Revised CHAM algorithm performance in Chrome (CPB: Cycle Per Byte).
AlgorithmLanguageOptimization TechniquesAverage TimingCPB
revised CHAM-64/128JavaScript4-round combining0.0000013 s260
revised CHAM-128/128JavaScript4-round combining0.0000018 s180
revised CHAM-128/256JavaScript4-round combining0.0000021 s210
This work CHAM-64/128Web Assembly4-round combining0.0000006 s120 (2.1 times)
This work CHAM-128/128Web Assembly4-round combining0.0000006 s60 (3 times)
This work CHAM-128/256Web Assembly4-round combining0.0000007 s70 (3 times)
This work CHAM-64/128Web Assembly4-round combining
precomputation table
0.0000005 s100 (1.2 times)
Table 11. Revised CHAM algorithm performance in Firfox (CPB: Cycle Per Byte).
Table 11. Revised CHAM algorithm performance in Firfox (CPB: Cycle Per Byte).
AlgorithmLanguageOptimization TechniquesAverage TimingCPB
revised CHAM-64/128JavaScript4-round combining0.0000013 s260
revised CHAM-128/128JavaScript4-round combining0.0000010 s100
revised CHAM-128/256JavaScript4-round combining0.0000015 s150
This work CHAM-64/128Web Assembly4-round combining0.0000006 s120 (2.1 times)
This work CHAM-128/128Web Assembly4-round combining0.0000006 s60 (1.6 times)
This work CHAM-128/256Web Assembly4-round combining0.0000007 s70 (2.1 times)
This work CHAM-64/128Web Assembly4-round combining
precomputation table
0.0000005 s100 (1.2 times)
Table 12. Revised CHAM algorithm performance in Microsoft Edge (CPB: Cycle Per Byte).
Table 12. Revised CHAM algorithm performance in Microsoft Edge (CPB: Cycle Per Byte).
AlgorithmLanguageOptimization TechniquesAverage TimingCPB
revised CHAM-64/128JavaScript4-round combining0.0000012 s240
revised CHAM-128/128JavaScript4-round combining0.0000013 s130
revised CHAM-128/256JavaScript4-round combining0.0000020 s200
This work CHAM-64/128Web Assembly4-round combining0.0000006 s120 (2 times)
This work CHAM-128/128Web Assembly4-round combining0.0000007 s70 (1.8 times)
This work CHAM-128/256Web Assembly4-round combining0.0000007 s70 (2.8 times)
This work CHAM-64/128Web Assembly4-round combining
precomputation table
0.0000005 s100 (1.2 times)
Table 13. w N A F algorithm performance in Chrome (CPB: Cycle Per Byte), (w = 4).
Table 13. w N A F algorithm performance in Chrome (CPB: Cycle Per Byte), (w = 4).
AlgorithmLanguageAverage TimingCPBPerformance Overhead
original wNAFJavaScript0.000012 s300-
Existing Atomic Block wNAFJavaScript0.0000179 s44749%
Proposed Atomic Block wNAFJavaScript0.0000146 s36521%
original wNAFWeb Assembly0.0000011 s27 (11 times)-
Existing Atomic Block wNAFWeb Assembly0.0000017 s42 (10 times)55%
Proposed Atomic Block wNAFWeb Assembly0.0000013 s32 (11 times)18%
Table 14. w N A F algorithm performance in Firefox (CPB: Cycle Per Byte), (w = 4).
Table 14. w N A F algorithm performance in Firefox (CPB: Cycle Per Byte), (w = 4).
AlgorithmLanguageAverage TimingCPBPerformance Overhead
original wNAFJavaScript0.0000146 s365-
Existing Atomic Block wNAFJavaScript0.0000162 s40510%
Proposed Atomic Block wNAFJavaScript0.0000155 s3876%
original wNAFWeb Assembly0.0000012 s30 (12 times)-
Existing Atomic Block wNAFWeb Assembly0.0000015 s37 (10 times)23%
Proposed Atomic Block wNAFWeb Assembly0.0000013 s32 (12 times)6%
Table 15. w N A F algorithm performance in Microsoft Edge (CPB: Cycle Per Byte), (w = 4).
Table 15. w N A F algorithm performance in Microsoft Edge (CPB: Cycle Per Byte), (w = 4).
AlgorithmLanguageAverage TimingCPBPerformance Overhead
original wNAFJavaScript0.0000129 s322-
Existing Atomic Block wNAFJavaScript0.0000209 s52262%
Proposed Atomic Block wNAFJavaScript0.0000175 s43735%
original wNAFWeb Assembly0.0000011 s27 (11 times)-
Existing Atomic Block wNAFWeb Assembly0.0000015 s37 (14 times)37%
Proposed Atomic Block wNAFWeb Assembly0.0000012 s30 (14 times)11%
Table 16. HMAC algorithm performance in Chrome (CPB: Cycle Per Byte).
Table 16. HMAC algorithm performance in Chrome (CPB: Cycle Per Byte).
AlgorithmLanguageAverage TimingCPB
SHA-256JavaScript0.0000163 s203
HMACJavaScript0.0000558 s697
This work SHA-256Web Assembly0.0000022 s27 (7.5 times)
This work HMACWeb Assembly0.0000074 s92 (7.5 times)
Table 17. HMAC algorithm performance in Firefox (CPB: Cycle Per Byte).
Table 17. HMAC algorithm performance in Firefox (CPB: Cycle Per Byte).
AlgorithmLanguageAverage TimingCPB
SHA-256JavaScript0.0000173 s216
HMACJavaScript0.0001852 s2315
This work SHA-256Web Assembly0.0000016 s20 (10.8 times)
This work HMACWeb Assembly0.0000075 s93 (24.8 times)
Table 18. HMAC algorithm performance in Microsoft Edge (CPB: Cycle Per Byte).
Table 18. HMAC algorithm performance in Microsoft Edge (CPB: Cycle Per Byte).
AlgorithmLanguageAverage TimingCPB
SHA-256JavaScript0.0000177 s221
HMACJavaScript0.0000555 s693
This work SHA-256Web Assembly0.0000016 s20 (11 times)
This work HMACWeb Assembly0.0000078 s97 (7.1 times)
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Park, B.; Song, J.; Seo, S.C. Efficient Implementation of a Crypto Library Using Web Assembly. Electronics 2020, 9, 1839. https://doi.org/10.3390/electronics9111839

AMA Style

Park B, Song J, Seo SC. Efficient Implementation of a Crypto Library Using Web Assembly. Electronics. 2020; 9(11):1839. https://doi.org/10.3390/electronics9111839

Chicago/Turabian Style

Park, BoSun, JinGyo Song, and Seog Chung Seo. 2020. "Efficient Implementation of a Crypto Library Using Web Assembly" Electronics 9, no. 11: 1839. https://doi.org/10.3390/electronics9111839

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop