1. Introduction
The establishment of secret cryptographic keys based on common randomness (CR) and additional public channel for public discussion has not lost its relevance since the founding works [
1,
2,
3,
4,
5,
6]. The ubiquity of internet connections enables public discussion in a realistic scenario, even in wartime conditions (for instance, STARLINK on Ukraine’s battlefield). Another prerequisite for the application of this class of protocols is the existence of CR with sufficient capacity.
In the class of sequential key distillation (SKD) protocols based on the source model [
7], biometric signals were used as a CR source: gait [
8,
9], ECG [
10], eye and mouse movement [
11], and EEG [
12,
13]. The achieved secret keys generation speed varies from 2 to 26 b/s for the, so called, source-type model with no side information for the attacker [
1]. This model encompasses gait, ECG, and other sources suitable for generating a secret cryptographic key for secure communication between different devices located on the human body [
8]. In these settings, the secret key capacity is given by the mutual information rate of terminal signals available at the beginning of the protocol. In the case of the so-called source-type model with no side information for the attacker [
1], discussed in papers [
12,
13], generation speeds of 10 b/s were achieved for the so-called EEG metrics signals, up to 1200 b/s for 14 channel raw EEG signals. For this type of model, the secret key capacity is given by the mutual information rate of terminal signals, conditioned by the corresponding eavesdropper signal available at the beginning of the protocol.
In [
14,
15], the distance between the legitimate nodes in mobile wireless networks acts as the observed CR. Experimental results give the speed of generated keys in the range of 0.1 to 0.6 b/s, depending on the speed of the terminals and the position of the eavesdropper.
All these facts show that, for now, there are no results related to source-type models with side information for the attacker of high secret key generation speed, except for the system based on raw EEG signals presented in [
13].
In the class of SKD protocols based on the channel model [
7], the wireless channel, through which legitimate participants communicate, is used as a source of CR. An excellent overview of these systems is given in [
16]. The achieved secret key generation speeds range from 0.037 b/s to 1800 b/s. Note that for the highest speed of 1.8 kb/s, the Key Disagreement Rate (KDR) is 8%. In addition, there is a consensus (see [
17,
18]) that the performance of this class of systems must assume very sharp restrictions on the attacker’s freedom, calling into question the reliability of the proclaimed security level of generated cryptographic secret keys.
In this paper, a new concept of a high-speed SKD system based on the CR contained in the speech signal of the protocol participants is presented. The novelties are the following:
To our knowledge, this is the first publicly available paper in which speech is used as CR in the class of SKD protocols based on the source model. As an independent system, our system can also be seen as a speech-controlled device that transforms voice inputs into secret keys. Within more complex protection systems, it can take on the role of an autonomous system of generating and distributing secret keys independently of the used telecommunication channel. Finally, as a part of a complete speech protection system, it provides complete autonomy and full voice control.
Maximizing CR utilization was achieved with a neural predictor designed to predict the attacker’s conditional Renyi entropy of order 2 (ECRE2) (collision entropy). In this way, the speed of the generated secret keys is adapted as a function of the side information available to the eavesdropper. Unlike the system from [
13], stylometric features of binary strings were used before the Privacy Amplification (PA) stage, which does not require the exchange of additional information over a public channel. In this way, the final length of the generated keys is unavailable to the eavesdropper, which increases their equivocation.
The impact of random permutation of the signals of legitimate users was examined and experimentally evaluated. It is shown that this transformation significantly reduces the correlation of the eavesdropper signal with the signals of legitimate users, which practically eliminates the need for a compression block before applying PA.
PA block, based on machine learning, uses the effect of Spoiling information related to Renyi entropy [
4]. This gives a more accurate estimate of the ECRE2 lower bound, enabling further increases in CR utilization and the speed of secret key generation.
On an evaluation set of 100 words “house” spoken by different speakers at a sampling rate of 16 kb/s, an average secret key generation speed of 28 kb/s was achieved, which makes this system widely applicable in different security scenarios and telecommunication channels with different transmission speeds.
The proposed system can be considered as an application of machine learning and deep learning techniques in Natural Language Processing Technologies (NLP). Namely, by transforming the binary string at the input to the PA block using the base 64 transformation, an equivalent new language is obtained, whose dynamics and correlation properties can be modeled more easily. Let us note that the question of chaoticity, hyper chaoticity [
19], and exponential stability [
20] is significant for connecting the desirable properties of secret cryptographic keys created in the domain of classical cryptography with the latest results in the domain of chaotic stochastic systems. If we impose an additional condition of self-synchronization on these systems, we approach a scenario very close to the concept of CR. On the other hand, we can see this work as a direct application of NLP technologies in the domain of SKD, ranging from the speech signal as a source of CR, feature engineering based on stylometric features, to deep neural network techniques that have been intensively developed within the framework of NLP.
The paper is structured in sections with the following content. In
Section 2, we introduce the basic terminology and meta structure of the generic SKD for speech signals as a source of CR. We outline main building modules, including lossless compression block (LLC), feature engineering block (FE), and prediction interval deep neural network block (PIDNN) for the estimation of the lower bound of ECRE2.
The concept of Spoiling knowledge lower bound for ECRE2 is presented in
Section 3. In Theorem 1, we prove that using this bound, in a PA block based on hash functions, from the universal class, gives secret keys of maximum uncertainty. Then, in Theorem 2, we further show that the appropriate PIDNN can be used to estimate the Spoiling knowledge lower bound for ECRE2. In this way, the practical application of this concept in the design of the SKD protocol is theoretically grounded. In
Section 4, the empirical evaluation of the proposed SKD system is presented in detail on an audio dataset of spoken words designed to help train and evaluate keyword spotting. In Concluding
Section 5, the practical importance of the proposed SKD system and the role of NLP technologies in its design are pointed out. The paper ends by stating some open design questions of this class of SKD systems.
2. Proposed Sequential Key Distillation Strategy for Speech Signals
According to the SKD for the source model [
2], observations of some exogenous CR source are available to protocol participants, see
Figure 1. The probabilistic characteristics of the source and the way the sequences are generated cannot be controlled by any protocol participants. In a probabilistic sense, we can describe it as a three-dimensional Discrete Memoryless Source (DMS) with a probabilistic structure
Legitimate participants of the protocol, known as Alice and Bob, observe components
and
, while the system attacker, known as Eve, has access to components
. The protocol implies the existence of an accessible public authenticated channel through which Alice and Bob exchange messages in both directions. This channel is passively observed by Eve. In [
2], it was shown that if this protocol is structured in four sequential stages: Randomness sharing, Advantage distillation (AD), Information reconciliation (IR), and Privacy amplification (PA), there is no loss of optimality. Namely, this sequential scheme provides all strong secret key rates below the secret key capacity of the given source of CR. The highest attainable secret key rate is defined as the secret key capacity of the given DMS and is equal to
where
denotes mutual information between
and
, while
denotes corresponding conditional mutual information from Eve’s side when she possesses observation. If Eve does not have observations
(no side information) or her observations are independent of
and
, the maximum value of the secret key capacity is obtained
In
Figure 2, we present the proposed DMS for which we used the speech signals of participants. We can also consider it as a virtual DMS, in which the selected phrase is spoken by three different subjects. The digitized speech signals correspond to the
,
, and
components of this virtual DMS. The correlation structure of the components comes from the anatomical determinism of human speech, as well as from the fact that the components originate from the same spoken phrase. The individuality of the articulatory apparatus of Alice, Bob, and Eve introduces the necessary degree of uncertainty into this DMS.
Figure 3 shows the basic architecture of the system. In addition to the usual AD, IR, and PA blocks, three additional blocks appear: LLC, in which lossless compression is performed with some selected algorithm of this class, FE—Feature engineering block, in which features are extracted from the sequence
in order to be input into the Machine Learning (ML) regression block.
The output of the ML block are predictions
of the Hamming distance between
and Eve’s wiretapped sequence
, as well as its lower and upper bound
, and
, respectively. Based on these estimates, the degree of compression of the PA block is determined and thus, the length of the generated secret key. The concept of adaptive determination of the degree of compression of the PA block based on machine learning was initially introduced in [
13]. In this way, the speed of the generated secret keys is adapted to the side information available to the eavesdropper. In contrast to the system from [
13], stylometric features of the sequence
were used, which do not require the exchange of additional information over a public channel. In this way, the final length of the generated keys is inaccessible to the eavesdropper, which increases their uncertainty. In addition, to estimate the lower bound of ECRE2, we used the Spoiling information effect related to this information measure. This has increased the efficiency of using CR and the speed of generating secret keys.
4. Experimental Evaluation
In order to experimentally evaluate the proposed SKD system, given in
Figure 3, it is necessary to specify its individual blocks.
There is an extensive analysis of the efficiency of AD algorithms, see e.g., [
6,
27]. Our experience so far shows that the Bit Parity Advantage Distillation (BPAD) protocol [
28,
29] is efficient enough for our needs. An important property of this algorithm is the existence of Eve’s optimal strategy, which is a repetition of the same steps performed by both Alice and Bob. This allows a good estimate of the amount of information leaked to Eve during the AD phase, i.e., the security margin of the distilled secret keys.
The situation is similar in the case of the selection of IR algorithms, whose basic function is to correct all the differences between Alice’s and Bob’s sequences, obtained after the AD phase. The approach that ensures minimal communication over a public channel belongs to algorithms from the class of error correcting codes. We chose the Winnow IR protocol based on Hamming error correcting codes [
30]. An important feature of this algorithm is also the existence of Eve’s optimal strategy proposed and analyzed in [
27]. In this way, as in the case of the AD algorithm, we can give a reliable estimate of the amount of information leaked to Eve during the IR phase.
The PA block was implemented using random binary matrices of dimension
, which have the Toeplitz structure. It is known that this class of hash functions belongs to the universal family of hash functions, see [
31,
32]. The Toeplitz hash functions are particularly suitable for speech signals, bearing in mind that typical values for
are of the order of
. Namely, the PA block can be efficiently implemented with a complexity of
, instead of
, in the case of hash functions in the form of random binary matrices without the Toeplitz structure. Our implementation is based on modules that are part of the SciPy package [
33].
The degree of compression of the PA block is determined by the dimension r of the Toeplitz hash matrix and is obtained based on the PIDNN output, according to expressions (10) and (13). Theorem 2 and Remark 2 guarantee that the secret keys thus generated are absolutely secret and have maximum uncertainty.
In subsequent analysis, we will consider two variants of the proposed SKD system: without LLC block (system A) and with LLC block (system B). In system B, LLC compression is achieved by Huffman’s optimal source coding over the source alphabet of size
[
34]. More precisely, the systems consist of a chain of blocks:
4.1. Voice Data Set
In the empirical evaluation, an audio dataset of spoken words is used [
35,
36]. The dataset consisted of 105,829 utterances obtained from 2618 different speakers who spoke 35 different words, with the sample data encoded as linear 16-bit single channel PCM values, at a 16 kHz rate [
36]. Each spoken word lasts 1 s. A subset of 100 voice signals, corresponding to the pronunciation of the word “house”, was randomly selected from this set. Each of these signals can belong to Alice and Bob, giving a population of distinct
pairs of size 100 × 99/2 = 4950, bearing in mind that swapping Alice and Bob in the SKD protocol does not affect the resulting secret keys. For each of the 4950 pairs, Eve’s signal Z is randomly selected from the set of signals from which Alice’s and Bob’s signals are excluded. In this way, 4950 realizations of DMS
were obtained, where each of the sequences is 16,000 × 1 × 16 = 256,000 bits long.
During the design of SKD systems based on the voice source model, pre-processing and analog-to-digital conversion significantly determines the correlation properties of the resulting DMS and hence, the maximal secret key rate. When selecting a suitable value of the number of bits per sample, the dominant criterion should be the speed of the generated secret keys.
Following proposition 5.6., from [
37], it is advisable to choose the simplest possible quantization procedure if no limitation is imposed on the transmission speed on a public channel. Since, in this paper, we do not consider speed limitations of data transmission over a public channel, we choose scalar uniform quantization.
The Shannon, or block entropy [
38], is defined by
where
is the probability of occurrence of the pattern
.
Figure 5 shows the dependence of the estimated block entropy rate on the number of bits per sample of the voice signal.
From the graph in
Figure 5, it is evident that by increasing the number of bits per sample, the block entropy rate will initially increase and then decrease. It was noticed by many authors, see e.g., [
37], that the secret key extraction rate may increase with over-quantization. Therefore, we decided to design a system operating with
nb = 16, i.e., in the over-quantization regime.
Figure 6 shows the histogram of normalized Hamming distances
of all 4950 distinct
pairs. For two binary sequences,
and
of the same length computation of this distance is defined by
Note that has the value 0 if these sequences are identical, and the value 0.5 if they are mutually independent.
Normalized Hamming distances of all pairs (Alice, Bob), as well as distances (Alice, Eve), are shown in
Figure 7. Let us recall that Eve was chosen from the pool of remaining participants in a random manner. In terms of evaluating the security aspects of this protocol, this choice of Eve corresponds to an insider attack, which we can consider as the worst case from the point of view of legitimate participants. Therefore, the performed analysis and security parameters are reliable indicators of the practical security of the entire system.
4.2. Decorrelation of Eavesdroppers and Legitimate Users
The resulting secret key rate will be much higher when Eve’s sequence is less correlated with Alice’s and Bob’s sequences. Therefore, if Alice and Bob apply the same randomly chosen permutation to their sequences, their normalized Hamming distance will remain unchanged, while the mathematical expectation of the distance of Eve’s sequence to these permuted sequences will be 0.5. In other words, if the legitimate participants of the protocol apply the same random permutation, about which Eve has no information, there will be a complete decorrelation between her sequence with the sequences of Alice and Bob. This situation is shown in
Figure 8.
The advantage of decorrelation is observed at all phases of SKD. The evolution of normalized Hamming distances in the first and the second iteration of the applied BPAD is presented in
Figure 9 and
Figure 10. Namely, while the normalized Hamming distances to Eve remain 0.5, the distances of the sequences between Alice and Bob decrease rapidly. As seen in
Figure 10, the expected value of this distance after the second iteration of the BPAD algorithm is about 0.01, which enables a very efficient application of the Winnow protocol in the IR phase [
27].
Figure 11 shows that, after applying the Winnow protocol, all pairs of (Alice, Bob) sequences are matched (the normalized Hamming distance is equal to 0), while Eve remained at the distance of 0.5.
To generate permutation, an initial short secret key previously exchanged between Alice and Bob is necessary. Since the basic SKD protocol is preceded by the authentication of the public channel, that is, the authentication of legitimate users, this initial secret key can be locally generated based on the cryptographic keys used in the authentication phase. Another possibility is the synthesis of an autonomous SKD system that does not depend on the stage of user authentication. First, a short secret key is distilled over the original (non-permuted) signals with a lower key rate, followed by the main distillation phase with permuted signals.
4.3. Feature Engineering Based on Information Theoretic Measures and Stylometry
In [
13], it was shown that information-theoretic features give great results for predicting the ECRE2 lower bound in the case of EEG sources. In this paper, we used all features from [
13] except features 10 and 11. To those features, we added new information-theoretic features consisting of normalized block entropy of block sizes 2, 14, and 20 calculated on the agreed sequence before PA. Let us denote all information-theoretic features as IT_1-IT_14. Features IT_4, IT_5, and IT_6 require information exchange over a public channel between Alice and Bob. The question arises whether it is possible to find a set of features that can be calculated locally by Alice and Bob, which do not require communication over a public channel. In order to explore that possibility, a set of stylometric features, described in detail in [
39], was added except for the last feature. We will denote them as ST_1-ST_22. These features were obtained by previously transforming the sequence
using a Base64 encoder into “sentences” of the language over the alphabet of size 64 [
39].
The ranking of features from the combined set of these features was performed by calculating their SHAP (SHapley Additive exPlanations) values [
40,
41]. Shapley’s values have their origins in cooperative game theory. Due to their theoretical foundation and practical usability, Shapley values are becoming a dominant method used in machine learning for evaluation and ordering of feature importance.
The architecture of the used PIDNN is shown in
Figure 12 according to the notation adopted in the Keras API [
42]. The first layer is a dense layer of dimension 128. After Batch normalization, follows the second dense layer of dimension 64 and finally the third dense layer of dimension 2 × 64, with 2 outputs on the first block (upper bound, lower bound) and one output on the second block (predicted value).
In the feature engineering phase, the input to the neural network is dimension 36, which is the total dimension of the combined information-theoretic and stylometric features (14 + 22 = 36). After selecting the 11 most informative features, the final PIDNN is of the same architecture, except that the input dimension is 11.
The calculation of SHAP values was performed within 10-fold cross-validation, which means that the displayed values are equal to the average values of all 10 folds.
Figure 13 and
Figure 14 show the first 11 ranked features for systems A and B, respectively. We note that the synergy of information-theoretic and stylometric features for both systems resulted in a set of 11 most informative features, which do not require additional information exchange via a public channel. Namely, there are no features IT_4, IT_5, and IT_6 in those sets. This property is important not only in terms of security, but also allows an independent local synthesis of PIDNN blocks on the side of legitimate users.
It is interesting to point out the relative importance of information-theoretic and stylometric features. For this purpose, we can define indicators
They give the relative contribution of information-theoretic and stylometric features in percentage, respectively. For System A, we obtain values , while for system B, . The greater influence of stylometric features in System A can be explained by the fact that these features better describe unmodeled serial correlations of CR sources, which are more present in this system. The influence of stylometric features decreases in system B because these unmodeled serial correlations of CR sources are removed by Huffman coding.
4.4. Performance Measures
In the experimental evaluation, we tested three different PA strategies:
ML strategy based on PIDNN and Spoiling knowledge lower bound for ECRE2
where is the number of bits exactly eavesdropped by Eve in the phases of the protocol before sequence compression, while denotes the security parameter . Since the BPAD and Winnow algorithm for Eve’s optimal strategy does not provide Eve with this type of information, , for the parameter t, we adopted so that different PA strategies can be consistently compared.
To compare different PA strategies, two indicators, Gain and Loss, were introduced in [
13]. If we consider two PA strategies, A and B, Gain is defined as follows
where
and
denote the lengths of secret keys generated using the same input sequences
with strategies A and B. Loss of PA strategy A is given by
where
is secret keys generated by PA strategy (25), for the same input sequences
. Therefore,
can be interpreted as a measure of the unused randomness of a given CR source under PA strategy A.
We also need to define
and
given by the following expression
which represents the mean and the minimum value of ECRE2 of the given M realization of the corresponding DMS.
Since Eve’s eavesdropper channel is BSC, it follows that
where is
the normalized Hamming distance between the sequences
and
.
The mean value of the spoiled ECRE2 lower bound
, obtained from PIDNN is denoted by
The quantity
represents the potential gain of the secret key lengths when applying the optimal PA strategy (25) in comparison to the strategy based on the global minimum (26). Similarly,
is the gain of the secret key lengths when applying the two PA strategies, one based on machine learning (27) and the other representing global minimum strategy (26). Corresponding losses
express the degree of unused of the given DMS, conditioned with chosen PA strategy.
The key rate and the key acceptance rate are given by
The leakage rate quantifies the amount of Eve’s information contained in her wiretapped sequence about Alice and Bob’s distilled common key, scaled to one bit
since Eve’s eavesdropper channel is BSC, see
Figure 3.
Table 1 summarizes the flow chart of the experimental evaluation. The obtained results are presented in
Table 2.
To check for the randomness of the obtained sequences from systems A and B, we performed tests developed by the US National Institute of Standards and Technology, NIST [
43]. The randomness tests are based on the statistical test suite, and the results are shown in
Table 3. The result of each test is given by the
p-value. The requirement for passing an individual test is that the obtained
p-value is higher than 0.01. The results confirm that the generated secret keys of both systems have passed the test, that is, they meet the defined randomness criteria.
Based on the results presented in
Table 2 and
Table 3, we can draw the following conclusions:
In terms of the most important indicators, such as KR, KAR, , and LR, systems A and B show very similar performances. From here follows that the dominant impact on system performance stems from input decorrelating permutation, not from Huffman source coding. The conclusion that the decisive influence on the performance of the system does not have Huffman coding but the input decorrelating permutation.
The obtained KR of 11% is, according to our insight, the highest publicly published KR for systems of this class. If it is considered that the speed of DMS based on speech signal is 256 kb/s, (16 bits per sample and sampling rate of 16 kHz), the speed of generating absolutely secret keys is about 28 kb/s.
The resulting LR is of the order of bits per one bit of the generated secret key. This means that Eve gets a negligible 0.85 b for every 28 kb of generated secret key. This amount of leaked information is scattered throughout Eve’s wiretapped sequence, which makes it almost impossible to reconstruct any bit of the distilled secret keys.
In the available literature, there are no works that explicitly state the LR of the proposed systems, except for works [
12,
13]. They reported LR values from
to
for DMS related to EEG signals. This is more than one order of magnitude worse result. Since the systems from [
12,
13] do not contain the initial decorrelating permutation, it is reasonable to conclude that its important influence on the performance of an SKD system is expressed not only in the increase of KR but also in the significant decrease of LR.
The value % indicates that the proposed system makes maximal use of the randomness of the given CR source. It is also a confirmation that the concept of adaptive PA based on PIDNN and Spoiling knowledge lower bound of ECRE2 is practically optimal. Let us emphasize that in the optimal case, the value for the loss is .
High values of quantities and (between 65 and 410) indicate the superiority of the PA block design based on the estimate of the lower bound of ECRE2, compared to classical approaches based on the global minimum.
The NIST test confirms the randomness of distilled secret keys. With negligible LR, the proposed system can be used to generate and distribute secret cryptological keys in systems that provide absolute secrecy in the Shannon sense.
Based on all presented results, the proposed SKD system enables the design of fully autonomous systems for voice protection, see
Figure 15. Green border lines indicate a classic system, while blue border lines indicate a fully autonomous system. Autonomy is understood here both in terms of independence from any additional system for generating and distributing secret keys as well as in terms of complete voice control of the system. The hands-free feature of the speech protection system is of exceptional practical importance, both in military and civilian applications.
5. Security Issues and Application
5.1. Limitation of the Proposed Methodology
The presented methodology of SKD system design is applicable when we have sufficiently representative samples of the selected DMS. Although at first glance this seems to be limiting, it is not so. If the dominant criterion is the minimization of security breaches of the entire protection system, of which the SKD system is just one part, it is clear who are the good choices for the legitimate users of the system. Based on this set of legitimate users, a sufficiently representative sample of the selected DMS is available at the beginning of the system design phase. The decorrelation permutation at the system input prevents any initial advantage for the attacker, which gives an additional security margin even in the case of an insufficiently large sample of legitimate users. Finally, Huffman’s source compression coding significantly reduces information leakage toward Eve. It is interesting to note that even in the case of an insider attack, i.e., assuming the role of Eve by one of the legitimate users, the decorrelation effect of the initial permutation has the same impact as if it were an illegitimate user. The variance of the ML regression estimator of the ECRE2 lower bound can always be compensated by increasing the security parameter t. Our experimental results show that, for the speech signal, this parameter is of the order of several tens of bits, which slightly affects the reduction of the final KR of the generated secret keys. Other classes of security threats include the violation of the authenticity of the public communication channel and the change of Eve’s strategy from a passive observer to an active attacker. It was shown in [
44] that the problem of an active attacker can be solved by doubling the security parameter t, which in our system remains in the domain of several tens of bits. Furthermore, using standard cryptographic mechanisms for authentication and the integrity of messages exchanged over a public channel, the threat from this class of attack is reduced to a minimum. Therefore, we can conclude that the performance of the proposed SKD system will remain practically unchanged even in this worst scenario.
5.2. Robustness to Different Types of Speech Input
The used speech base is originally intended to help train and evaluate keyword spotting systems. This speech base covers variations in a wide range of manner and speed of pronunciation: accented speech, different emotional states of speakers, gender, positions of start and end of words, level of background noise, etc. We tried to ensure that our sample of 100 speakers validly represented all these variations.
Experimental results show that the proposed system is extremely robust to all pronunciation and input speech quality variations. This can be explained by the fact that the digitization and serialization of speech samples, followed by permutation, give a sufficiently good approximation of the Bernoulli (0.5) distributed binary sequence of legitimate users. In addition, since the permutation does not destroy the cross-correlation properties of the sequences of legitimate users, it proves to be sufficient to achieve high-speed KR.
5.3. Generalization to Different Types of Encryption Protocols and Cryptographic Systems
The proposed SKD system can be:
stand-alone module for generating and distributing cryptographic secret keys,
integrated into some of the existing cryptographic protocols and systems.
In the first case, independence from the main communication channel is a particular advantage since the system uses CR based on the source model. In addition, all system segments can operate in the offline mode, from the formation of binary strings of protocol participants to the final PA phase. The offline mode of operation also enables more effective procedures for additional system protection from compromising electromagnetic emanation and side channel attacks of various types and origins. In this mode of operation, the system can be used for critical information infrastructure protection. In this mode of operation, the system can be a supplier of absolutely secret cryptographic keys to those elements of critical information infrastructure, which use cryptographic protection mechanisms based on cryptographic keys of the highest quality and secrecy level.
In the second case, integration with existing cryptographic protocols and systems is more a question of the achieved functionality level after integration than the possibility of applying this system. For example, the proposed SKD system can replace the Diffie Helman protocol for establishing symmetric keys in all systems and protocols in which it is incorporated (IPSec, SSL, etc.). Whether this integration increases the system functionality is a separate question related to the purpose of the integrated system and the conditions in which it will be working. In this scenario, the system works in the online mode. That entails additional requirements regarding the availability and quality of the public channel. The problem of the public channel with transmission errors can be solved by adding error-correcting codes in both the AD and IR phases of information exchange over the public channel. This addition only increases the complexity of the proposed SKD but does not compromise its most important performances, such as KR and LR.
5.4. Computational Complexity
The computational complexity of the AD and IR phases of the proposed SKD is of order , while the complexity of Huffman source compression and Toeplitz hash function is of order where n is the length of input signals. Therefore, overall computational complexity is of order ). This indicates that the implementation of the proposed algorithm does not require special memory and computational resources. In addition, if the system works only in the offline mode, there is an even more reduction in required resources.
5.5. Summary of the Main Challenges
The main challenges of this work can be listed as follows:
Showing that speech can be a source of CR for sufficiently efficient SKD systems.
Maximizing CR mainly through weakening the attacker’s position mechanisms (decorrelation of Eve’s signals) and minimizing the measure of unused for chosen CR. We solved this challenge with an adaptive determination of the maximal length of the secret keys mechanism for a given level of leakage to Eve. The proposed solution consists of a PIDNN ML block, which in working-mode functions only based on local information available to protocol participants and does not require any information exchange over a public channel.
Reliable measurement of LR. This requirement entails two additional ones: (a) proof that Eve’s strategy is optimal, (b) that Eve’s knowledge of the generated secret keys is provably bounded with a predetermined confidence level. Eve’s optimal strategy is to repeat the moves of the legitimate participants in the AD phase, which follows from the fact that the considered DMS is equivalent to Mauer’s satellite scenario. Theorems 1 and 2 and the implemented adaptive PA block ensure the fulfillment of requirement (b).
6. Conclusions
This paper presents a successful application of NLP technologies (speech processing, machine learning, and stylometry) in the synthesis of high-efficiency SKD protocols. The developed PA strategy, based on PIDNN in order to estimate the lower bound of ECRE2, enables the maximum use of the randomness contained in the selected CR source. At the same time, this approach enables the precise quantification of leaked information to an eavesdropper. By introducing a decorrelating permutation in the proposed SKD system at the very beginning of the information processing chain, a significant increase in the key rate and a decrease in leaked information were obtained (both improvements at the level of an order of magnitude). All these results represent a solid basis for the practical implementation of a wide class of fully autonomous speech protection systems for both military and civilian applications.
An open question that has been waiting to be answered since Maurer’s pioneering work is the practical characterization of CR sources, which would enable the maximization of the key performances of SKD systems based on these sources. Moreover, our future research will be focused on finding other CR generation mechanisms that would enable the synthesis of efficient secret key distillation systems. In this respect, in our opinion, all stochastic self-synchronizing structures are interesting, such as those presented in the paper [
45], as well as some classes of neural networks that enable synchronous distributed training, such as Siamese neural networks [
46].