1. Introduction
In today’s digital age, the exponential growth of data generated by the Internet of Things (IoT) and other digital platforms has led to the emergence of vast repositories of information known as data lakes [
1,
2]. These expansive, fluid collections of raw data, stored in their native format, offer unprecedented opportunities for analytics, innovation, and personalized services. The concept of data lakes has revolutionized the way businesses and organizations harness the power of big data, enabling the aggregation and analysis of varied data types to glean insights, drive decision making, and foster technological advancements [
3,
4]. By integrating diverse datasets, data lakes facilitate comprehensive analytics that can predict trends, enhance efficiency, and create more tailored user experiences.
However, the benefits of leveraging extensive datasets come with significant privacy and security challenges. As we navigate through this wealth of information, the line between valuable insights and invasive privacy breaches becomes increasingly blurred. The need to protect sensitive information while exploiting data to their full potential has never been more critical. Traditional cybersecurity measures often fall short of addressing the unique vulnerabilities presented by the interconnected and open nature of data lakes and IoT ecosystems. The dynamic landscape of digital threats necessitates innovative approaches to data security, ones that can adeptly safeguard privacy without sacrificing the utility and benefits of these technological advancements [
5,
6,
7].
The introduction of fully homomorphic encryption (FHE) stands out as an opportunity for addressing the privacy–security challenge [
8,
9]. FHE is a form of encryption that allows computations to be performed on encrypted data, generating encrypted results that, once decrypted, reveal the outcome of operations as if they had been conducted on the original plaintext. This groundbreaking capability ensures that sensitive data can remain encrypted throughout the analytical process, thereby preserving confidentiality while still enabling valuable insights to be extracted.
The incorporation of FHE into the realms of data lakes and IoT represents a transformative approach to managing the inherent privacy and security risks associated with big data analytics. By enabling data to be processed in their encrypted form, FHE mitigates the risks of data breaches and unauthorized access, ensuring that the integrity and confidentiality of sensitive information are maintained. This is particularly pertinent in the context of data lakes, where the sheer volume and variety of data heighten the potential for privacy infringements. Nevertheless, it is important to note that homomorphic encryption does not provide a means to bypass privacy regulations such as the European General Data Protection Regulation (GDPR) [
10]. For instance, it is not feasible for a cloud service to apply homomorphic encryption in order to derive insights from customer data while they remain encrypted. Rather, the outcomes of computations performed on encrypted data stay encrypted and can only be decrypted by the data’s owner, such as a customer of a cloud service. This adds another limitation, which is that in situations where several distinct private data owners intend to partake in joint computational efforts, homomorphic encryption is likely not a viable option. This is because homomorphic encryption frameworks operate with a singular secret key, which is possessed by the owner of the data.
In this paper, we offer a detailed empirical analysis of FHE using the Microsoft SEAL library, with the aim of understanding FHE’s practical performance and scalability. We particularly shed light on the optimization challenges and potential solutions for implementing FHE in edge computing environments, such as those encountered with IoT devices, by comparing its performance on a traditional computing system (PC) and an edge device (NVIDIA Jetson Nano). This comparison not only illustrates the specific computational and memory constraints of edge devices but also emphasizes the need for algorithmic improvements to ensure the viability of FHE in such contexts. In addition, our work could serve in addressing real-world implementation by bridging the theoretical aspects of cryptographic advancements with their practical application implications.
The rest of the paper is organized as follows.
Section 2 presents a brief on the state of the art in FHE use and the main libraries available.
Section 3 describes the research methodology. In
Section 4, we showcase and discuss the results. The paper ends summarizing the most important outcomes.
3. Methodology
The research methodology and tools utilized for conducting experiments in this study are thoroughly described next. Throughout this study, we focus on the practical application of homomorphic encryption using the Microsoft SEAL library to carry out the experiments. The explanation includes system requirements, software, and tools.
3.1. System Requirements
To ensure optimal performance and the correct implementation of the Microsoft SEAL library, specific system requirements must be met. In this work, we are working with the latest stable version of Microsoft SEAL to date, version 4.1.1. It is advisable to try to use the most recent version in order to take advantage of the possible improvements that this library may provide.
The requirements are fundamental for projects that demand high resource usage or have particular needs, and they might vary to accommodate the specific characteristics of a project. The requirements include sufficient hard disk space for installing all necessary software and tools; compatibility with various operating systems such as Linux, macOS, or Windows (with a preference for Windows 10 for its compatibility with SEAL); at least 4 GB of RAM (although 8 GB is recommended for efficient cryptographic operations); and a 64-bit processor architecture for optimal performance. Additionally, an appropriate development environment, particularly Visual Studio 2022, is essential for C++ project development, ensuring high performance when using the Microsoft SEAL library. The inclusion of specific work packages during the Visual Studio installation, such as the “Desktop development with C++” package and “CMake Tools,” is also necessary for the project’s successful execution.
3.2. Software and Tools
We have employed several software tools and technologies to facilitate the practical implementation of homomorphic encryption techniques:
CMake (version 3.27.4) is a key tool for controlling the compilation process, defining project build files, and ensuring flexibility across different platforms. It generates CMakeLists.txt files, which are instrumental in compiling and building the source code. For this work, version 3.27.4 was used, which is the latest stable version available at the time of the project’s execution;
Git is employed for version control, allowing for the maintenance and tracking of different versions of applications and projects. It was specifically used to monitor updates to the Microsoft SEAL repository and clone it locally for implementation;
Visual Studio 2022 (v.17) is an Integrated Development Environment (IDE) that provides a comprehensive development platform for writing, debugging, and compiling code as well as deploying applications. It supports a wide range of programming languages and project types, including C++ and .NET applications, mobile app development, and web design. In this work, we utilized Visual Studio 2022 for its advanced debugging, compilation, and development tools, along with its stable features up to the current date.
The applied methodology involves practical application through a simple coding example that demonstrates homomorphic encryption’s functionality. Specifically, the example presents a homomorphic sum, in which two integers are encrypted and then added together in encrypted form, after which both the numbers and the result of the sum are decrypted and displayed. The code will be tested in different devices and with different required computational resources, as explained in the next section. The setup encompasses configuring the development environment, integrating necessary libraries and dependencies, and detailing the software used. With this example, we showcase operations on encrypted data using the Microsoft SEAL library, emphasizing the encryption and decryption processes, as well as operations such as addition on encrypted integers.
3.3. Experiments Design
Experiments are carried out in two different batches. In the first batch, we use Visual Studio 2022 in a PC with Windows 10, Intel Core i7-8550U processor, and 8 GB of RAM. In the second batch, we use a Jetson Nano NVIDIA equipped with a quad-core ARM Cortex-A57 processor, 128-core Maxwell GPU, and 4 GB LPDDR4 RAM, using JetPack 4.6. JetPack is NVIDIA’s all-inclusive Software Development Kit (SDK) that provides the board support package, Linux operating system, NVIDIA CUDA, cuDNN, and TensorRT software libraries for deep learning, computer vision, GPU computing, and multimedia processing. The installation of SEAL was performed manually using CMake.
In both cases, the study analyzes the impact of varying test parameters on program execution time and CPU resource requirements. The parameters included are as follows:
Dataset size, varied to analyze the program’s response to different quantities. The sizes tested were 1, 10, 100, 1000, and 10,000 samples, offering a broad range from minimal to large datasets. This dataset is composed of integers that are randomly generated.
Encryption parameters focus on the BFV encryption scheme, specializing in integers, with the variable
poly_modulus_degree impacting scheme complexity and operational capacity. The values tested were 1024, 2048, 4098, 8192, 16,384, and 32,768, providing a wide spectrum of results from low to high complexity levels. We chose BFV versus BGV; note that both are efficient in working with integers but BFV is more efficient for lower values of the
plain_modulus variable, as explained below [
23].
The test program included a detailed setup for SEAL, generating the random numbers for encryption and measuring execution time for various configurations, as shown in Algorithm 1. As for the SEAL setup, when using the BFV scheme, there are three parameters that are key. These are plaintext modulus (plain_modulus), poly_modulus_degree (poly_modulus_degree), and coefficient modulus (coeff_modulus). Plain_modulus is an integer that defines the range of integers (coefficients of polynomials) that can be represented and operated on within the encrypted space. That is, the plain modulus determines how data are encoded before encryption. Its main goal is to enable the correct wrapping of integer values during arithmetic operations in the encrypted form. Consequently, this variable sets the bounds for data representation, but it does not affect the computational complexity of encryption. We checked this behavior experimentally, and it is correct. Therefore, the parameter plain_modulus will be kept at a fixed value of 1024 because our dataset contains randomly generated integers smaller than 1024.
Next, coeff_modulus is a modulus used for the coefficients of the polynomials that represent encrypted data. As we know, ciphertexts and encryption operations in the BFV scheme are represented as polynomials, and these polynomials’ coefficients are reduced modulo of the coeff_modulus. The size of the coeff_modulus in homomorphic encryption schemes directly influences security by determining the noise budget. The larger the noise budget is, the larger the amount of noise that can be introduced into the ciphertext before it becomes too corrupted to be decrypted accurately. However, a larger coeff_modulus also increases ciphertext sizes, impacting the encryption system’s performance and necessitating a balance between security enhancements and computational efficiency. For this reason, a recommended value for coeff_modulus is given by the SEAL library once the polynomial modulus degree is set. The polynomial modulus degree (poly_modulus_degree) determines the size of the polynomials used in the encryption scheme. This parameter influences the performance of encryption. The Poly_modulus_degree must be a positive power of 2, representing the degree of a power-of-two cyclotomic polynomial, which defines the polynomial ring over which all arithmetic operations are conducted. Larger values allow for more complex encryption computations, making it more difficult for potential attackers to derive useful information from encrypted data and hence providing better security, but also results in larger ciphertext sizes and increased computational overhead because ciphertext will contain more coefficients.
The execution time and CPU usage were measured across the different data sizes and poly_modulus_degree values, aiming for a deep understanding of performance under various configurations. The chrono library was included in the program to measure the execution time. Time measurements started just before the cipher and stopped just after the execution of the cipher. The confidence interval of the results was 95%, which was calculated with a normal distribution using 5 samples for each combination of the values for poly_modulus_degree {1024, 2048, 4098, 8192, 16,384, and 32,768} and data samples {1, 10, 100, 1000, 10,000}; hence, averages will be shown in the results.
Regarding the CPU measurements, for the first batch of experiments with the PC, we used the Windows resource monitoring tool. For the second batch of experiments with the Jetson, we used
htop as the system monitoring tool. It runs on the terminal and visually provides system resource utilization. In both cases, instead of carrying out several consecutive runs as we did to measure the execution time, we kept the
poly_modulus_degree as a fixed value and carried out one run for each dataset to observe the CPU behavior. Then, the same process was repeated for all values of the
poly_modulus_degree.
Algorithm 1. Pseudocode to show the process of encrypting a set of integers using the Microsoft SEAL library’s BFV homomorphic encryption scheme and measure the execution time. |
Initialize vector plain_numbers to store integers |
Generate and store x random integers in plain_numbers |
Repeat the following steps 5 times: |
Start timing the encryption process |
Set up encryption parameters: |
Define poly_modulus_degree as y |
Set coefficient modulus based on BFV default for the defined degree |
Set plain modulus as 1024 |
Initialize the SEAL encryption context with these parameters |
Generate cryptographic keys: |
Create a key generator using the SEAL context |
Generate a public key and a secret key |
Initialize encryptor and decryptor: |
Create an encryptor with the context and public key |
Create a decryptor with the context and secret key |
Initialize an empty vector cipher_numbers to store encrypted data |
For each number in plain_numbers: |
Convert the number to a plaintext object |
Encrypt the plaintext to ciphertext using the encryptor |
Store the ciphertext in cipher_numbers |
Stop timing the encryption process |
Calculate and print the duration of the encryption process in seconds |
Print the total number of samples in plain_numbers |
4. Results
This section presents the results from the series of experimental tests evaluating homomorphic encryption on datasets, specifically using Microsoft’s SEAL library. Homomorphic encryption, as discussed in the earlier sections, allows for operations to be performed directly on encrypted data, showcasing vast potential in security and privacy protection. SEAL is highlighted as a prime choice for the practical implementation of homomorphic encryption in projects today. The experimental tests focused on two main factors: CPU usage and the execution time necessary for encrypting datasets. The primary goal was to identify optimal parameter values for the best performance across different scenarios, understanding how parameter variations impact performance, particularly concerning the encryption scheme used and the dataset size.
The results for the first test batch, which was conducted using Visual Studio 2022 in a Windows 10 environment with the system powered by an Intel Core i7-8550U processor and 8 GB of RAM, are depicted in
Figure 1. It can be observed that the execution time varied significantly with dataset size and
poly_modulus_degree value. Smaller datasets and lower degrees showed faster execution times, while larger datasets and higher degrees increased execution time exponentially, indicating a direct correlation between data complexity, encryption scheme complexity, and performance.
The CPU load was closely monitored to assess the efficiency and practicality of homomorphic encryption in real-world applications. In this case, the Windows resource monitoring tool was used to monitor the CPU activity for 60 s, during which time various executions of the program were performed. The process to followed involves performing the tests, keeping a fixed value for the degree of the polynomial, and varying the size of the dataset to see the changes it produces in the CPU load. Then, the same process will be repeated for all the values of the polynomial degree studied in this work. For each configuration, we ran the program twice to correctly obtain the results, since the first run involves compiling and configuration and does not really show the result of the clean encryption. Nevertheless, both runs are shown in the illustrations. From
Figure 2,
Figure 3 and
Figure 4, the tests reveal that higher
poly_modulus_degree values and larger datasets demand substantially more CPU resources, confirming the need for optimized parameter selection to balance security, privacy, and performance effectively.
From the results shown in
Figure 2, using a
poly_modulus_degree of 1024, it can be seen that in the first run the CPU load oscillates around 70% for all the data sizes. In the second run, we can see that the larger the size is, the higher the load achieved, especially for large values. The highest increase is reached at 10,000 encrypted integers, where up to 30% is used, while small and medium values remain between 10% and 20%. For a polynomial degree value of 8192 and 10,000 samples, as in
Figure 3, on average an 80% CPU load is reached in the first run, whereas in the second run it varies from 20% in small values and 30% in medium values up to peaks of 60% or 70%. Using a
poly_modulus_degree of 16,384 and 1000 samples, as in
Figure 4, the results show that in the first run an average of 70% of the peak CPU load is reached. It is so demanding that it reaches almost 100% load. For the second run, the load oscillates around 30%. In this scenario and with 10,000 samples, a 100% peak load is reached due to the complexity of the run. Finally, for a degree value of 32,768, the load in the first run averages 75%, except for 10,000 samples where it peaks at 100%. In the second run, it oscillates between 30% and 40% up to sets of 100 samples. For 1000 samples, 100% is reached in some parts of the run. For a larger set size, it is not possible to carry out the execution due to the increased complexity and memory demand.
As we have seen, the impact of caching between the first and second runs is notable. This effect is due to the compilation of the code, configuration of the encryption parameters, and initialization of cryptographic contexts and keys. Once these operations are executed, some elements such as compiled code and key structures are cached, enabling faster access and reduced CPU load in later runs. Additionally, operating systems and runtime environments may optimize memory management and execution paths based on previous operations, enhancing performance efficiency. In the first run across various configurations, the CPU load was relatively high as the system compiled and configured the encryption setup. For example, with a
poly_modulus_degree of 1024 (
Figure 2, white arrow), the CPU load averaged around 70%. However, in the second run, the CPU load markedly decreased. For large datasets (10,000 samples), the load dropped from near maximum capacity in the first run to about 30% in the second run, showing a substantial reduction. This reduction in CPU load in the second run indicates that the system benefits from caching, which could include optimized access to frequently used data or compiled code, in turn resulting in faster execution times and lower resource usage in successive runs.
The second batch of experiments was carried out on a Jetson Nano. The exploration of homomorphic encryption performance on the Jetson Nano provides a distinct perspective on the applicability and scalability of cryptographic practices in edge computing environments. Given the inherent resource constraints of devices like the Jetson Nano, this analysis not only benchmarks the operational efficiency of homomorphic encryption but also delineates the potential for deployment in IoT ecosystems. The Jetson Nano Developer Kit by NVIDIA has a quad-core ARM Cortex-A57 processor, 128-core Maxwell GPU, and 4GB LPDDR4 RAM. The configuration involved setting up the environment with JetPack, flashing the system image, and installing the Microsoft SEAL library and monitoring tools like htop for system utilization visualization.
The critical parameters under investigation mirrored those of the previous PC-based tests, emphasizing dataset size and the
poly_modulus_degree within the BFV encryption scheme. The analysis began with an examination of execution times across varying dataset sizes and
poly_modulus_degree values. The execution time on the Jetson Nano exhibited a pronounced increase with larger datasets and higher degrees, similar to observations made on the PC (see
Figure 5). However, distinct disparities emerged as the dataset size expanded. For smaller datasets (1, 10 samples), the execution times were comparatively lower on the Jetson Nano than on the PC, suggesting an efficient utilization of the Nano’s processing capabilities for lightweight cryptographic tasks. As the dataset size increased to 100 samples, a tipping point was observed where the execution time on the Jetson Nano began to lag significantly behind the PC’s performance, underscoring the impact of limited computational resources. Finally, for larger datasets (1000 and 10,000 samples), the execution time on the Jetson Nano escalated dramatically, reflecting the device’s constraints in handling intensive cryptographic operations. The device struggled with the largest datasets, especially at higher
poly_modulus_degree values, illustrating the limitations of edge devices in executing complex homomorphic encryption tasks without optimization.
The investigation extended to CPU load and system utilization, revealing the Jetson Nano’s operational thresholds. Similar to the previous experiments, initial tests with smaller datasets and lower
poly_modulus_degree values demonstrated minimal CPU strain, with system utilization remaining within acceptable ranges. This indicated the Nano’s capability to handle basic homomorphic encryption tasks efficiently. When progressing to medium-sized datasets, an increase in CPU load was observed, indicating heightened resource demand. However, the device managed to complete the tasks, albeit with increased execution times. Then, at the upper values of dataset size and
poly_modulus_degree, the Jetson Nano’s CPU load reached its zenith, often hitting 100% utilization. This was particularly evident for the largest datasets, where the device occasionally failed to complete the encryption process due to exhaustive memory consumption and processing power limitations. The results are shown in
Table 1 and
Table 2.
An interesting point to note is how the occasional failures noted in the experiments on the Jetson Nano under high loads and large data sizes could lead to potential vulnerabilities from a security-application perspective. We have seen that these failures occur because of the limited memory capacity, resulting in the inability to complete the encryption task. Consequently, attackers could deliberately target systems using similar edge computing devices with high-load tasks to induce failure, causing a denial of service through resource exhaustion. Another risk would be the exposition of unencrypted data, or partially encrypted data, when the encryption process is incomplete or failed. Finally, it is important to observe that continuous operation near resource limits could lead to system instability, which might be exploited to disrupt operations or to force systems into a less secure state.
A comparative analysis between the PC-based experiments and those conducted on the Jetson Nano sheds light on the scalability and adaptability of homomorphic encryption technologies across different computing environments. This comparison is crucial for understanding the practical implications of deploying advanced cryptographic solutions within both traditional and edge computing frameworks. In this sense, the Jetson Nano exhibited good performance with smaller datasets, often matching or slightly outperforming the PC setup in terms of execution time. This efficiency underscores the Nano’s potential for handling lightweight encryption tasks effectively, despite its constrained computational resources. However, as the dataset size increased to medium levels, the execution time on the Jetson Nano began to lag significantly compared to the PC, highlighting the impact of its limited processing capabilities on handling more complex cryptographic computations. The disparity in execution time became more pronounced with larger datasets. The PC maintained a relatively stable performance curve, managing to complete encryption tasks with considerable efficiency even as the complexity of operations increased. Conversely, the Jetson Nano’s performance degraded severely under the weight of larger datasets, particularly at higher poly_modulus_degree values, illustrating the challenges faced by edge devices in executing computationally intensive homomorphic encryption tasks without specific optimizations.
The comparative analysis of CPU load revealed that the Jetson Nano operates within a higher utilization range across all dataset sizes and poly_modulus_degree values compared to the PC. For small datasets, the difference in CPU load between the two platforms was minimal, showcasing the Nano’s capability to efficiently manage basic encryption tasks. However, as the computational demands increased, the Nano’s CPU load reached its peak, often hitting 100% utilization, while the PC showed a more gradual increase in CPU load, maintaining better overall performance stability. Similarly, memory utilization trends have highlighted significant constraints on the Jetson Nano, especially with large datasets. The PC’s ample memory capacity allowed it to handle extensive cryptographic operations with relative ease, whereas the Jetson Nano struggled with memory saturation, leading to execution failures in the most demanding scenarios. This limitation points to the critical need for memory-efficient cryptographic algorithms and optimizations tailored for edge computing devices.
Traditional computing environments and edge devices like the Jetson Nano reveal critical insights for implementing homomorphic encryption in edge computing. There is a pronounced need for optimization to adapt encryption practices to the limitations of edge hardware, requiring algorithmic refinements that minimize computational and memory demands. Furthermore, the performance gap underscores the necessity for both cryptographic algorithm innovations and advancements in edge device capabilities. Tailoring encryption methods to meet the specific needs of edge computing can significantly improve operational efficiency, making secure, privacy-preserving computations feasible within IoT ecosystems. Moreover, the deployment of homomorphic encryption technologies must consider application-specific demands, including data sensitivity, computational requirements, and resource availability. The analysis highlights the importance of edge-specific optimizations and parameter adjustments to meet these needs effectively.