2. Related Works
Security, as is known, is associated with information that, during the operation of searchable encryption schemes, is revealed or leaked to an attacker who has access to the database server. Bösch et al. [
6] believe that information leakage is possible in such schemes, which can be divided into three groups:
- (a)
index information (refers to the information about the keywords contained in the index);
- (b)
search pattern (information that can be obtained by knowing whether two search results refer to the same keyword);
- (c)
an access pattern (refers to information that is implied by the query (search) results, namely which documents contain the requested keyword for each of the queries [
13] or which document identifiers match the query [
14]).
Bösch et al. [
6] note that in many schemes, there is leakage of at least the search pattern and the access pattern. At that, identifying the search pattern may not be a problem in some scenarios, whereas for others it is unacceptable. For example, in a medical database, disclosing a search pattern through statistical analysis (which allows an attacker to get full information about the plaintext keywords) can lead to the leakage of a large amount of information. This information can be used to match it with other (anonymous) public databases.
Fuller et al. [
2] distinguish two types of entities that can pose a security threat to a database: a valid user known as an insider who performs one or more roles and an outsider. The latter can monitor and potentially modify network interactions between valid users, separating attackers into those that persist for the lifetime of the database and those that obtain a snapshot at a single point in time. At that, attackers are divided into those that persist for the lifetime of the database and those that obtain a snapshot at one a single point in time [
15]. In addition, Fuller et al. [
2] differentiate attackers into those who are: semi-honest (or honest-but-curious), i.e., those who follow the prescribed protocols, but may try to get additional information from data that they observe; and malicious, that is, those that actively perform actions aimed at obtaining additional information or influencing the operation of the system. They also note that much of the active research in protected search technology considers semi-honest security against a persistent insider adversary. At that, special attention is paid to such types of objects within a protected search system that are vulnerable to leaks, such as: (a) data items and any indexing data structures; (b) queries; (c) records returned in response to queries or other relationships between data items and queries; (d) access control rules and the results of their application.
The cryptographic community has developed several common primitives:
- −
fully homomorphic encryption [
16,
17,
18,
19],
- −
functional encryption [
20,
21] with its subclasses and earlier representatives:
- −
predicate encryption [
22,
23],
- −
identity-based encryption [
24],
- −
attribute-based encryption [
25]
and some others that completely or partially solve the problem of searching in a secure database. Protected search techniques are often based on these primitives, but rarely rely solely on one of them. Instead, they tend to use specialized protocols, often with some leakage in order to improve performance [
2].
One possible approach to reduce the damage caused by a server compromise is to encrypt sensitive data and run all computation (application logic) on the clients. However, as noted by Popa et al. [
26], some important applications are not suitable for this approach. For example, database-backed websites that process queries to generate data for the user, and applications that compute large amounts of data. Another possible approach is the use of such theoretical solutions as fully homomorphic encryption (FHE) [
16,
17,
18,
19]. Its use allows servers to compute arbitrary functions over encrypted data while only clients see the decrypted data. However, one of the problems of schemes with fully homomorphic encryption is performance, since current schemes require large computational resources and large storage overheads [
6,
26]. For some applications, so-called somewhat homomorphic encryption schemes may be used. These schemes are more efficient than FHE, but only allow a certain number of additions and multiplications [
16,
18]. The main problem when using somewhat or fully homomorphic encryption is that the resulting search schemes require a linear search time in the length of the dataset and this is too slow for practical use in modern applications.
As noted earlier, the problem of searching over encrypted data is of great interest from both theoretical and practical points of view. This is explained by the importance of ensuring the security and privacy of data stored and processed on third-party remote cloud servers of the service provider. However, as noted by some experts in this field [
9,
27], research on this topic is more focused on the scenario of a user who outsources an encrypted set of documents (such as e-mails or medical records) and would like to continue keyword search in this encrypted dataset. However, in practice, many companies, organizations, and institutions store data in databases that use the relational data model. Users are accustomed to using widely accepted SQL, which allows them to store, query, and update their data in a convenient way. Databases that support SQL (this applies in general to both NewSQL and some NoSQL databases that also allow you to work in the SQL query paradigm) provide fast search and retrieval of records, provided that the database can read out the data contents. However, encryption makes it difficult to search encrypted databases. Therefore, the direct application of solutions to search for the required information in the encrypted data of traditional databases is not an easy task.
In order to solve certain issues, Hacigümüş et al. [
28] have developed techniques by which the bulk of the work of executing SQL queries can be performed by the service provider without the need to decrypt the stored data. The paper explores an algebraic structure for query splitting to minimize client-side computations. Using a so-called “coarse index” allows you to partially execute the SQL query on the provider side. The result of this query is sent to the client. The final correct result of the query is found by decrypting the data and executing a compensation query on the client side.
Popa et al. [
26] proposed a system called CryptDB that supports SQL queries over encrypted data. This solution is based on various types of encryption, such as random (RND), deterministic (DET), and order-preserving encryption (OPE), applied to a SQL table column. To request data from an encrypted database, CryptDB converts an unencrypted SQL query into its encrypted equivalent and decrypts the appropriate encryption layers. CryptDB achieves its goals using three ideas: running queries over encrypted data using a new encryption strategy with SQL support, dynamically adjusting the encryption level using encryption onions to minimize the information disclosed to the untrusted DBMS server, and chaining encryption keys to user passwords in a way that only authorized users can access to encrypted data. At that, although CryptDB protects data confidentiality, it does not guarantee the integrity, actuality, or completeness of the results returned to the application. However, the main disadvantage of CryptDB, as noted by Azraoui et al. [
9], is that whenever one layer is removed, the encryption scheme becomes weak. In light of this, the main problem is to provide a practical solution for searching over encrypted databases that does not suffer from the leakage occurring in CryptDB and that provides transparent processing of complex queries over encrypted SQL databases. In their paper [
9], the authors attempt to solve this problem by proposing a practical construct for searching data in an encrypted SQL databases that limits information leakage. Their solution is based on the searchable encryption technique developed by Curtmola et al. [
29] and applied to unstructured documents. This mechanism creates an inverted search index of keywords in the database to enable keyword search queries over encrypted data. The practicality of this solution is achieved through the use of the cuckoo hashing technique, which makes the search in the index efficient. The proposed solution supports Boolean and range queries.
Pilyankevich et al. [
27] propose a system (called Acra) which allows, among other things, to provide a search for encrypted data in SQL databases. The proposed Acra Searchable Encryption (Acra SE) solution is based on a blind indexing approach that develops the original idea of the CipherSweet project [
30]. The main component of the Acra SE scheme is the so-called Acra Server, which works as a reverse proxy (transparent encryption/decryption proxy server). It sits between the application and the database. The application does not know that the data are encrypted before it gets into the database, the database does not know that someone encrypted the data. It is worth noting that the encryption and secure search functions of Acra Server can be configured for each column. This means that every table in the database can be fully encrypted (every column), partially encrypted (some columns are encrypted, some not), or fully unencrypted. All Acra’s searchable encryption security properties are very similar to the security properties of CipherSweet, which poses the risk of partially known plaintext attacks. In this connection, Pilyankevich et al. [
27] provide practical recommendations to ensure security. However, despite certain solutions aimed at ensuring the security of storing and searching for sensitive data, Acra, like CipherSweet, which was taken as a prototype of a searchable encryption scheme, supports the minimum functionality of queries, namely, only for equality.
Various DBMSs are characterized by the so-called technology of “transparent data encryption” (TDI) [
31], which allows you to selectively encrypt sensitive data stored in database files, as well as in files related to data recovery, such as redo logs, archive logs, backup tapes. The essence of transparent encryption is that a combination of two keys is used: a key for each database table, which is unique, a master key that is stored outside the database in the so-called “wallet”. Data stored on disk are encrypted; however, they are automatically decrypted for the legitimate user to process queries. That is, when the user selects encrypted columns, the DBMS quietly extracts the key from the “wallet”, decrypts the columns and shows them to the user. As a result, the server must have access to the decryption keys, and an attacker who has compromised the DBMS software can gain access to all data. Therefore, the main goal of TDE is to protect sensitive data located in the corresponding files of the operating system. TDE is not a full blown encryption system and it should not be used in this capacity.
In addition, attention should be paid to the fact that the ability to perform search operations over encrypted databases leads to the complexity of systems and an increase in the amount of memory required and query execution time. At that, some searchable encryption schemes when performing certain queries do not provide sufficient data confidentiality. That is, with long-term observation, an attacker can obtain a significant part of the information about sensitive data. All this testifies to the openness of the secure search problem and the need for further research in this direction to ensure secure work with remote databases and data storages.
3. Main Security Aspects
In the proposed solution, as in most active research in the field of secure search technology, we consider a curious database administrator (DBA) or other external attacker with full access to the data stored on the DBMS server as the main threat to database security while recognizing that such attackers can be either semi-honest or malicious.
In our solution, we use a searchable encryption scheme that allows the owner of the secret key (more precisely, keys) to read and write data, creating searchable encrypted data, and trapdoors (providing the so-called architecture (S/S)—single writer/single reader [
6]). In doing so, we extend this scheme to the so-called multiwriter/multireader (M/M) architecture layer by distributing a secret key to allow multiple users to perform write and search over an encrypted database. Moreover, we are taking certain measures to simplify the process of user revocation if necessary (not to cause big overhead).
The architecture of the proposed solution is shown in
Figure 1. This is a typical of three-tier client-server architecture. This architectural pattern helps to structure applications that can be decomposed into groups of subtasks in which each group of subtasks is at a specific level of abstraction [
32]. This allows developers to create flexible and reusable applications. In addition, the three-tier architecture allows applications in any of the three tiers to be independently upgraded or replaced in response to changing requirements and technologies, thereby simplifying the implementation and maintaining such a system up to date, taking into account the development of security standards.
In the solution under consideration, users and the application server are trusted, and the DBMS server is untrusted. Each user has some security token, in which container files (stegocontainers)—, are stored in a compressed and encrypted form () using the secret key . These container files contain decryption keys: for the special database table and for special software modules. and modules of special software are stored on the DBMS server. The container files are different for each legitimate user; moreover, where and how the corresponding keys and are contained in them, the users themselves do not know. These stegocontainers usually have different names. The database proxy leads them to names known to special software stored in encrypted form on the DBMS server. This is intentional in order to simplify the process of user revocation. Namely, do not distribute once again a new secret key (in this context, a file with stegocontainers that store the keys and ) among the remaining users.
The transmitted compressed encrypted file () from the corresponding legitimate user arrives at the application server (database proxy). The database proxy (it is essentially a reverse proxy between the application and an untrusted DBMS server), knowing the secret key , decrypts and decompresses the file () when processing the first user request related to the search for the required encrypted data in the active session. The decrypted container files , are subsequently transferred over a secure channel to the DBMS server.
Table
stores in encrypted form the keys for decrypting the encrypted sensitive data of certain attributes of the corresponding database tables. Persistent stored modules (PSMs) of special software are also stored on the DBMS server in encrypted form. They provide secure covert (without leaving a trace in the existing means of documenting completed queries) automatic extraction and decryption (without showing the plaintext) required to find encrypted data using keys stored in
. At the same time, using the method based on the use of the potential of the modern blockchain model described in [
33,
34], the integrity of the key table and persistent stored system and user modules, as well as modules of special software developed within the framework of the proposed approach, is controlled. This increases the security of stored data and special software (increases protection against unauthorized modification, including through malware) with lower overhead costs (the amount of data stored for this and computing resources). Therefore, for example, in order to control the integrity and authenticity of a specific stored module (as is known, some unintentional and deliberate changes to certain software can potentially lead to data breaches) in the usual way, it would be necessary to perform hashing and digital signature procedures with storing the corresponding data for each of them. The use of the hash tree structure allows ensuring the integrity control not only of the specific PSM being checked, but also of all other stored database programs since this one data fragment is included in the general structure and changing at least one bit in it will entail a complete change in the value of the Merkle root. At the same time, it is appropriate to note that the developed software used in the proposed solution is considered secure (at least from the point of view of the basic principles of building secure software) [
35]). As for the potential presence of possible vulnerabilities in modules of developed special software, which can be directly used by an attacker to implement security threats, this is a separate research topic, which is not discussed in this paper.
To encrypt/decrypt data of fields of various types of a tuple row of a certain table
of a database, a scheme is used based on the use of keys
,
,
similar to that described in [
36,
37], where
is a unique 128-bit random value (secret key) generated by a cryptographically strong pseudo-random number generator (PRNG) for each table
, constant for all values that will be encrypted in this table;
is a unique 128-bit random value (secret key) generated by a cryptographically strong PRNG for each attribute
of table
; and
is the value of the integer identifier of the primary key of the
-th row of the table
. Secret (symmetric) keys
,
are encrypted by one of the cryptographically strong algorithms and stored in the special database table
. The values of these keys are never shown. They are not known to either the database administrator nor to any other user. Indirect access (through special software) to the secret keys is available to an authorized user with the appropriate privileges, who will present the corresponding keys (
,
) in the active session. At the same time, the values of the keys (
,
) are also not shown anywhere in the clear.
As mentioned above, the special software itself is stored on the DBMS server in encrypted form. Decryption of this software and its activation is carried out automatically when executing the corresponding query related to the search for the required encrypted data in an active legitimate session. Such decryption can only be performed by a legitimate user for whom the database proxy server has the key to decrypt some module of the program (from the composition of special software), which extracts the key from the stegocontainer , which is necessary to decrypt the remaining encrypted modules of the special software package.
The legitimate user in the proposed solution is granted timely and uninterrupted access to the database. Availability is supported by proven traditional control mechanisms and strategies that provide authorized access and acceptable levels of performance to quickly handle interrupts, to provide for redundancy, to maintain reliable backups, and to prevent data loss or destruction.
After the corresponding query is executed, the decrypted modules of special software are automatically deleted from the database server. At the same time, the text of these modules is not shown in clear text; it will also not be possible to trace it in the query history (through the corresponding logs).
Next, we will consider in more detail the functionality and features of the proposed solution, the implementation of which allows you to search for the required data in an encrypted database and manipulate them.
4. The Proposed Technique for Searching the Required Data
Preparing to execute a search query
In order to be able to generate and execute the corresponding search queries for the required data stored in encrypted form in the database on a third-party remote DBMS server, as well as queries for manipulating such data, you must first perform some operations. The main such operations are:
A. Operations related to the creation of security tokens for legitimate users:
- −
generation by a cryptographically strong PRNG of the secret (symmetric) keys —for encrypting/decrypting the table and —for encrypting/decrypting modules of the special software;
- −
formation of stegocontainers , with their subsequent compression, encryption () and entering this information on security tokens;
- −
issuance to users of the created security tokens with .
B. Operations related to the generation of secret keys, subsequently used in an appropriate way by the proxy server and the database server:
- −
generation by a cryptographically strong PRNG of the secret keys , for each corresponding table () of the remote database and an attribute containing sensitive data that should be encrypted;
- −
generation by a cryptographically strong PRNG of key to encrypt/decrypt the module of the program for extracting key from .
C. Operations related to preparing for the secure operation of a remote database:
- −
creating a table in the database and writing to it the corresponding data encrypted with one of the cryptographically strong algorithms (for example, AES-256; is used as a key): secret keys , along with the names of the corresponding tables and attributes for which they were created;
- −
encryption using the keys and and recording in the database of special software modules.
Next, we will consider the actual algorithm for generating a search query implemented on a trusted application server (database proxy server). The main steps of the proposed algorithm for generating an SQL search query over encrypted databases are given below.
The -th legitimate user during an active session receives:
- (a)
compressed encrypted file with stegocontainers (at the beginning of the session);
- (b)
the original SQL query with plaintext ().
Decrypting the file with stegocontainers (), preparing them for transfer, and actually transferring the stegocontainers to the DBMS server. In the active session, when processing the first user query related to the search for the required encrypted data, the compressed file is decrypted and the corresponding stegocontainers , are extracted from it. The latter are renamed in order to bring their names in line with the names known to the special software and are transmitted to the DBMS server via a secure channel.
Parsing the original SQL query . Extraction from the user’s original SQL query of the keywords , presented in clear text, in accordance with which it is required to search for the required data in the database on a third-party remote DBMS server.
Generating a random number . For this, some PRNG is used: .
Formation of trapdoor , on the basis of which the search will be carried out over the encrypted database. To do this, the corresponding search keyword is encrypted using one of the cryptographically strong algorithms. At that, we use the same scheme that is used to encrypt the data of row for the corresponding column of some table of the database. Namely, an encryption scheme based on the use of some encryption algorithm (), encryption mode () and secret key () selected from the list, as a function of . That is . The proposed solution has no fundamental restrictions on the encryption algorithm that will be used in it. It can be any, either existing or newly developed; the only requirement is the presence of its implementation at the time of using the solution. For example, for definiteness, let these be the DES, Triple DES, AES algorithms with various modifications: ). There are also no fundamental restrictions on encryption modes. Example modes: .
It should be noted that the encrypted trapdoor values for the same keyword may differ (including due to the arbitrary choice of the number ). That is, trapdoors are non-deterministic, which makes it difficult to leak the search pattern.
- 6.
Stages 4 and 5 are repeated for all (). However, instead of generating a random number each time, an increment of its initially generated value is allowed: , , where is the cardinality of the set of keywords .
- 7.
Formation of the final query () to the database that does not disclose sensitive data.
- 8.
Launching some procedure stored on the DBMS server (), which determines the session parameters based on the data extracted from the transferred stegocontainers , , and key . These parameters are valid for a legitimate user only in his active session. They are used in the corresponding modules of special software that support the functionality of the proposed solution.
- 9.
Transferring the modified user query to the DBMS server for execution.
The general scheme of the search query formation algorithm (Algorithm 1) is presented below.
Algorithm 1: Query Formation Algorithm (QFA) |
Input Output and its transfer for execution
4: Read original SQL (query with plaintext)— 5: Parsing the original query
/* implementation dependent */
11: end for . 13: Running a procedure that defines session parameters. 14: Transferring the modified user query to the DBMS server for execution. |
Example 1. Let the following query come from a legitimate user:
select id, at_1, at_2 from TABLE_1
where at_1 in (‘4454102135347018’, ‘5167135104128196’)
order by id;.
The essence of this query is to determine what actions were performed using bank cards ‘4454102135347018′ and ‘5167135104128196′ for a certain time interval (for simplicity. This period is not explicitly indicated; for example, it is assumed that this is the all time report), where at_1 is the TABLE_1 () attribute of the remote database, the values of which are stored in encrypted form (in this case, this is the card number); at_2 is an attribute of TABLE_1 whose values are stored in plaintext (for example, an opcode); id is a unique row identifier (primary key—PK) of the TABLE_1 table.
Then, in accordance with the above algorithm, the following actions will be performed.
After receiving the compressed encrypted file , decrypting it () with extracting the corresponding stegocontainers, as well as receiving the original SQL query with plaintext, the latter is parsed on the DB proxy server.
Keywords are extracted first. In this case, such keywords are bank card numbers = ‘4454102135347018′, = ‘5167135104128196′. After that, using the existing PRNG, a random number is generated. For example, suppose the PRNG generated a random number c = 160634721428732436748471615956417046369.
Then, trapdoors are formed, on the basis of which a search will be carried out over the encrypted database. In the case under consideration, for = ‘4454102135347018′ and the value of can be equal to ‘0512F7D59B92BF08EFAE3E0E488D1B67CC1AE5A57991C2BE00B1F3 5C2AA9D310′.
As noted above, encrypted trapdoor values for the same keyword may differ. For example, when = 24339107466813018440859597836919234881, the value of is ‘1085E6EBA12C50E2352BEA9D021A7C8931299614906DF199BD34D1E87D436673′. Both one and the other value of trapdoor are equivalent when searching. The value for = ‘5167135104128196′ and = ‘335991485542368735771035274697529587778′ is determined in a similar way.
After that, based on the initial query , the final query is formed. Namely, based on the preferences of the algorithms used and encryption modes (for definiteness, let these be and ), the visibility of the query representation, as well as the specific implementation of the database on the Oracle DBMS platform, the query is converted to the following form:
select id, at_1, at_2 from TABLE_1
where get_decr_val(at_1, ‘TABLE_1′, ‘at_1′, ‘AES128′, ‘CBC’, id)) in
(
(select get_decr_val(‘0512F7D59B92BF08EFAE3E0E488D1B67CC1AE5A57991C2BE00B1F35C2AA9D310′, ‘TABLE_1′, ‘at_1′, ‘AES128′, ‘CBC’, 160634721428732436748471615956417046369) from dual),
(select get_decr_val(‘161349D2EA5D997C80F5E60FDDB7F6E17E206A483C19103E9A81680D1A92E71E’, ‘TABLE_1′, ‘at_1′, ‘AES128′, ‘CBC’, 335991485542368735771035274697529587778) from dual)
)
order by id;.
Distinguishing features of the converted query are shown in bold type, where get_decr_val is some function () of special software, where is either the encrypted value of stored in the -th row (id) of the -th column (at_1) of some table in the database (in this example it is TABLE_1), or the corresponding value of . This function is stored on the DBMS server in encrypted form. It implements a secure covert (without leaving traces in the available means of documenting completed queries) decryption of the data required for search (without displaying these data in the clear). is either a random number or a unique row identifier (id) of some table in the database. As can be seen from the above query , sensitive data are not presented anywhere in the clear.
After the query is formed, a command is issued to execute some procedure which determines the session parameters necessary for the legitimate user to work further in the active session. After that, the modified query itself is transferred to the DBMS server for execution.
Features of the query implementation on the DBMS server
As is known [
26], in the fight against the threat caused by the illegal actions of a curious DBA or other external attacker with full access to data, there is a problem that lies in the contradiction between minimizing the amount of confidential information disclosed to the DBMS server and the ability to efficiently execute various queries. At that, existing approaches to working with encrypted data are either too slow or do not provide adequate confidentiality. On the other hand, it is also known that encrypting the data with some efficient cryptosystem would prevent many SQL queries from being executed by the DBMS server. In this case, the only practical solution would be to give the DBMS server access to the decryption key. However, if certain efforts are not made to prevent access to this key (by taking possession of the value of this key) by illegitimate users, and first of all, in this case, this applies to privileged users (such as a DBA or an external attacker with DBA rights), then this would allow the attacker to gain access to all data. Therefore, taking into account all of the above, a compromise solution was adopted in the proposed approach. Namely, to use an effective cryptosystem to encrypt sensitive data, but to provide access to these data only to a legitimate user and only one that will provide the database server (indirectly through the database proxy) during an active session with an encrypted file with steganocontainers
containing the corresponding secret keys
,
of the user. The encrypted file
is decrypted and decompressed by the DB proxy server, and the corresponding container files
,
are transmitted over a secure channel to the DBMS server. On the DBMS server, the corresponding keys are extracted automatically by special software, the main modules of which are also encrypted. These modules are decrypted only if the
and
keys are available, which are associated with a legitimate user. The activity of the
,
keys is limited by the duration of the session. Their values are assigned to the corresponding attribute of the so-called application context (application context—a set of pairs: “attribute name—value”), more precisely, the database session-based application context, which is initialized locally [
38]. As is known [
39], such values are valid within the active session, and they are not visible to another session.
Thus, in our solution, the encryption keys are associated with the user’s keys/passwords, which allows decryption of the corresponding data elements only by legitimate users who present the corresponding keys in an active session. Therefore, a curious DBA and an attacker with DBA rights, not knowing , , and not being able to disclose their values, and, therefore, not being able to extract the corresponding keys (, ) from the table , will not be able to decrypt the main modules of special software and encrypted data, which is what we were striving for in this case.
In general, our solution allows the DBMS server to execute SQL queries over encrypted data in much the same way as if it were executing the same queries over open data. At the same time, existing applications do not need to be modified. The query itself is modified on the proxy server. The database server within the active session of a legitimate user executes a modified query, automatically extracting the required encrypted data. For automatic extraction of encrypted data, encrypted stored modules of special software are used. The latter are automatically decrypted by some open modules stored on the database server using and received from the proxy server.
We will consider the features of query processing on the DBMS server using the example of the implementation of the proposed solution for the Oracle DBMS. In our solution, we use the potential of the RLS (Row Level Security) technology [
40], supplementing it with the capabilities of the application context [
39,
41,
42]. We also use the capabilities of fine-grained audit (FGA) technology [
39,
41,
42], and first of all, the ability to mimic a SELECT trigger (by automatically executing a stored procedure when a user issues SELECT).
The main stages of the proposed solution are:
With the help of this procedure, using the key received from the DB proxy server, one of the modules of special software is decrypted to extract the key from . After decrypting the corresponding module, the latter is compiled and launched for execution. As a result of the operation of the decrypted module, the key is extracted from , the value of which is assigned to one of the attributes of the current user session—the application context attribute—atrc1 (the so-called local initialization of the application context, for user session). This value, as noted above, is valid within this session and is not available from other sessions. After determining the appropriate attribute of the application context, the decrypted module of the special software for extracting the key from is deleted.
Moreover, using the procedure, which uses the set value of the atrc1 application context attribute during its execution, the next module of special software is decrypted, followed by its compilation and execution to extract the key from . The resulting value of the key is assigned to another attribute (atrc2) of the current user session (local initialization of the application context). This value will later (when executing the corresponding queries) be used to enable the legitimate user to automatically extract (using decrypted special software) the appropriate keys from the encrypted table to decrypt sensitive data stored in the corresponding attributes of the remote database tables. This actual application context value is also only valid within this session and is not available from other sessions.
- 2.
To execute the corresponding request in our solution, it is proposed to use the capabilities of technologies: RLS (with the implementation of the application context) and FGA (namely, the ability to mimic a SELECT trigger). For this purpose, within the framework of the above technologies, an additional stored procedure and function were implemented. These persistent stored modules run automatically under the appropriate conditions specified in the appropriate security policies. It is these modules that are involved in decrypting the remaining stored modules of special software, compiling, running for execution, as well as automatically deleting the latter after executing the corresponding data search operator in a cryptographically protected database.
Activated stored modules of special software provide secure hidden (without leaving traces in the available documentation tools of the DBMS server) automatic extraction and decryption (without showing in the clear) of encrypted data using keys stored in . That is, the automatically obtained intermediate results of decryption , necessary to search for the required data among the encrypted ones ( is the encrypted value stored in the -th line of the -th column of some table of the database), are not disclosed to users, including privileged ones.
- 3.
The DBMS server returns the result of the query to the database proxy server in encrypted form.
- 4.
The database proxy server decrypts the received encrypted data and returns to the application the result of the original query () of the user.
The general scheme of the query execution algorithm (Algorithm 2) is presented below.
Algorithm 2: Query Execution Algorithm (QEA) |
Input Output. . ) of one of the special software modules. 1.2: Compiling and running the decrypted module of special software. . . 1.5: Using the value of the atrc1 application context attribute, the next module of special software is decrypted . . . ) over encrypted data. 2.1: Decryption of the remaining stored modules of special software, their compilation and execution. 2.2: The pre-decrypted software provides secure, hidden and automatic extraction and decryption (without displaying it in the clear) of the encrypted data required for searches. . 2.4: Removing decrypted stored modules of special software. to the database proxy server in encrypted form. 4: The database proxy decrypts the received encrypted data and returns to the application the result of the original user query Q. |
Thus, in order for the initial request to be executed only by a legitimate user, the latter in an active session must present the corresponding keys stored on the security tokens he has. The corresponding data are decrypted using the legitimate user’s key chain, including the key stored on the database proxy server.
Examples of data search queries (SELECT statement).
Pattern matching. Search for activities that were carried out using bank cards whose numbers start with ‘516’.
A search query for appropriate activities with such bank cards may look like this:
select id, at_1, at_2 from TABLE_1
where get_decr_val(at_1, ‘TABLE_1’, ‘at_1’, ‘AES128’, ‘CBC’, id))
Like
(select get_decr_val(‘82C5218A798679BB9C5FC82B135D66CD’, ‘TABLE_1’, ‘at_1’, ‘AES128’, ‘CBC’, 272076869831384594639313635595876910277)
from dual)
order by id;
where the value ‘82C5218A798679BB9C5FC82B135D66CD’ in the get_decr_val function is the encrypted value .
Range search. Search for activities that were carried out using bank cards whose numbers range from ‘378282246310005’ to ‘4454102135347018’.
Possible query appearance:
select id, at_1, at_2 from TABLE_1
where get_decr_val(at_1, ‘TABLE_1’, ‘at_1’, ‘AES128’, ‘CBC’, id))
BETWEEN
(select get_decr_val(‘785855FCE8CAC88C9876FB14834B1E4F’, ‘TABLE_1’, ‘at_1’, ‘AES128’, ‘CBC’, 41743711899582577735039044513724643184)
from dual)
AND
(select get_decr_val(‘CE11FA4F985C39A2F1F207FA2D687CEC85B17BE048FA7AD1768CAE5105563520’, ‘TABLE_1’, ‘at_1’, ‘AES128’, ‘CBC’, 12345999999) from dual)
order by id;
Search by inequality. Search for actions that were carried out using bank cards whose numbers are greater than ‘378282246310005’.
Possible query appearance:
select id, at_1, at_2 from TABLE_1
where get_decr_val(at_1, ‘TABLE_1’, ‘at_1’, ‘AES128’, ‘CBC’, id)) >
(select get_decr_val(‘785855FCE8CAC88C9876FB14834B1E4F’, ‘TABLE_1’, ‘at_1’, ‘AES128’, ‘CBC’, 41743711899582577735039044513724643184)
from dual)
order by id;
Above, examples of the implementation of search queries for the required data in the proposed solution using the SELECT statement were presented. However, as is known, the data to be protected must first be entered into the database in a safe way. In the future, actions aimed at modifying and deleting them must also be safe from the point of view of data confidentiality. Therefore, when writing new sensitive data to the database in the proposed solution, the original user query is also transformed by the database proxy. That is, the sensitive data specified in the original query in clear text are encrypted on the proxy server with one of the cryptographically strong algorithms. Encryption of data is a key component in implementing the principle of multi-level protection. Below, in pseudocode (using the extended Backus–Naur form—EBNF [
43]), the corresponding SQL statement is presented, which is transmitted to the DBMS server:
INSERT INTO [(col_1, col_2, col_3,…)]
VALUES (dataValuePlaintext_1, dataValuePlaintext_2,
dataValueCrypted_3,…);
where dataValuePlaintext_1, dataValuePlaintext_2 are open data written to the database; dataValueCrypted_3—encrypted sensitive data written , ) to the database. Thus, open data are written to the database unchanged, and sensitive data of the original query is encrypted by one of the cryptographically strong algorithms.
With a specific implementation of the database, for example, on the Oracle DBMS platform, this query may look like this:
INSERT INTO TABLE_1 (at_1, at_2, at_3)
VALUES (TABLE_1_at_1_SEQ.NEXTVAL, ‘XXX’, get_encr_val(get_decr_val(‘67984BE52836AAE16C58336213FB13FF’, ‘TABLE_1’, ‘at_3’, ‘AES128’, ‘CBC’, 73020958660698970019165279924045249079),
‘TABLE_1’, ‘at_3’, ‘AES128’, ‘CBC’, TABLE_1_at_1_SEQ.NEXTVAL));
At the same time, the new confidential value (in this example, it is the bank card number ‘373802460082124’), which is written to the database, is not shown to anyone in clear text. Based on it, a trapdoor (‘67984BE52836AAE16C58336213FB13FF’) is preliminarily formed on an arbitrary key (73020958660698970019165279924045249079) using the appropriate algorithm (in this case, AES128), using a special procedure (from the software of the proposed solution).
Queries to update and delete sensitive encrypted data in the proposed solution are implemented as follows.
When you update sensitive encrypted data, the database proxy will convert the user’s query to the form shown below in pseudocode:
UPDATE
SET at_1 =
WHERE = );
where is encrypted sensitive data that are written to the database, ; . Moreover, the values of and are not disclosed to users, including privileged ones. is the open value corresponding to the encrypted value stored in the -th line, -th column of table . is . The trapdoor value used for comparison in this query usually does not exist in the actual table (thus the search pattern is not disclosed).
When implementing a database on a specific platform, this query may look like this:
UPDATE Table_1
SET at_1 = get_encr_val(get_decr_val(‘1B1541B10DE424CA84F544CE8E70365B’, ‘TABLE_1’, ‘at_1’, ‘AES128’, ‘CBC’, 236509280401175942833044678405599098673), ‘TABLE_1’, ‘at_1’, ‘AES128’, ‘CBC’, id)
WHERE get_decr_val(at_1, ‘TABLE_1’, ‘at_1’, ‘AES128’, ‘CBC’, id) =
get_decr_val(‘9EAD4C452EC7A53AD1EA204B2FB0AC75A1ABDD8988BDE22D0E1C58DACC23360F’, ‘TABLE_1’, ‘at_1’, ‘AES128’, ‘CBC’, 290128337798191690105369894463767116351);
It should be noted that in this implementation:
- −
‘TABLE_1’;
- −
get_decr_val(at_1, ‘TABLE_1’, ‘at_1’, ‘AES128’, ‘CBC’, id);
- −
get_decr_val(‘9EAD4C452EC7A53AD1EA204B2FB0AC75A1AB DD8988BDE22D0E1C58DACC23360F’, ‘TABLE_1’, ‘at_1’, ‘AES128’, ‘CBC’, 290128337798191690105369894463767116351);
- −
= ‘9EAD4C452EC7A53AD1EA204B2FB0AC75A1 ABDD8988BDE22D0E1C58DACC23360F’ is the trapdoor value obtained by pre-encrypting the keyword on an arbitrary key ;
- −
get_encr_val((SELECT get_decr_val (‘1B1541B10DE424CA84F544CE8E70365B’, ‘TABLE_1’, ‘at_1’, ‘AES128’, ‘CBC’, 236509280401175942833044678405599098673) from dual), ‘TABLE_1’, ‘at_1’, ‘AES128’, ‘CBC’, id), get_encr_val is some function of the special software;
- −
= ‘1B1541B10DE424CA84F544CE8E70365B’ is the trapdoor value obtained by pre-encrypting the keyword on an arbitrary key
- −
;
- −
and used for searching are not stored in the database.
If it is necessary to delete the required sensitive encrypted data, the database proxy server transforms the user’s query to the form shown below in pseudocode:
DELETE FROM
WHERE = );.
When implemented, this query may look like this:
DELETE FROM Table_1
WHERE get_decr_val(at_1, ‘TABLE_1’, ‘at_1’, ‘AES128’, ‘CBC’, id) =
get_decr_val(‘9EAD4C452EC7A53AD1EA204B2FB0AC75A1ABDD8988BDE22D0E1C58DACC23360F’, ‘TABLE_1’, ‘at_1’, ‘AES128’, ‘CBC’, 290128337798191690105369894463767116351);