Secret Key Distillation with Speech Input and Deep Neural Network-Controlled Privacy Amplification

Radomirović, Jelica; Milosavljević, Milan; Banjac, Zoran; Jovanović, Miloš

doi:10.3390/math11061524

Open AccessArticle

Secret Key Distillation with Speech Input and Deep Neural Network-Controlled Privacy Amplification

¹

Vlatacom Institute of High Technology, Milutina Milankovica 5, 11070 Belgrade, Serbia

²

School of Electrical Engineering, Belgrade University, Bulevar kralja Aleksandra 73, 11120 Belgrade, Serbia

³

Faculty of Information Technologies, Belgrade Metropolitan University, Tadeuša Košćuška 63, 11000 Belgrade, Serbia

^*

Author to whom correspondence should be addressed.

Mathematics 2023, 11(6), 1524; https://doi.org/10.3390/math11061524

Submission received: 23 February 2023 / Revised: 16 March 2023 / Accepted: 17 March 2023 / Published: 21 March 2023

Download

Browse Figures

Versions Notes

Abstract

We propose a new high-speed secret key distillation system via public discussion based on the common randomness contained in the speech signal of the protocol participants. The proposed system consists of subsystems for quantization, advantage distillation, information reconciliation, an estimator for predicting conditional Renyi entropy, and universal hashing. The parameters of the system are optimized in order to achieve the maximum key distillation rate. By introducing a deep neural block for the prediction of conditional Renyi entropy, the lengths of the distilled secret keys are adaptively determined. The optimized system gives a key rate of over 11% and negligible information leakage to the eavesdropper, while NIST tests show the high cryptographic quality of produced secret keys. For a sampling rate of 16 kHz and quantization of input speech signals with 16 bits per sample, the system provides secret keys at a rate of 28 kb/s. This speed opens the possibility of wider application of this technology in the field of contemporary information security.

Keywords:

speech signals; secret cryptographic key establishment; stylometric features; Huffman source coding; collision entropy; deep neural networks

MSC:

68T07

1. Introduction

The establishment of secret cryptographic keys based on common randomness (CR) and additional public channel for public discussion has not lost its relevance since the founding works [1,2,3,4,5,6]. The ubiquity of internet connections enables public discussion in a realistic scenario, even in wartime conditions (for instance, STARLINK on Ukraine’s battlefield). Another prerequisite for the application of this class of protocols is the existence of CR with sufficient capacity.

In the class of sequential key distillation (SKD) protocols based on the source model [7], biometric signals were used as a CR source: gait [8,9], ECG [10], eye and mouse movement [11], and EEG [12,13]. The achieved secret keys generation speed varies from 2 to 26 b/s for the, so called, source-type model with no side information for the attacker [1]. This model encompasses gait, ECG, and other sources suitable for generating a secret cryptographic key for secure communication between different devices located on the human body [8]. In these settings, the secret key capacity is given by the mutual information rate of terminal signals available at the beginning of the protocol. In the case of the so-called source-type model with no side information for the attacker [1], discussed in papers [12,13], generation speeds of 10 b/s were achieved for the so-called EEG metrics signals, up to 1200 b/s for 14 channel raw EEG signals. For this type of model, the secret key capacity is given by the mutual information rate of terminal signals, conditioned by the corresponding eavesdropper signal available at the beginning of the protocol.

In [14,15], the distance between the legitimate nodes in mobile wireless networks acts as the observed CR. Experimental results give the speed of generated keys in the range of 0.1 to 0.6 b/s, depending on the speed of the terminals and the position of the eavesdropper.

All these facts show that, for now, there are no results related to source-type models with side information for the attacker of high secret key generation speed, except for the system based on raw EEG signals presented in [13].

In the class of SKD protocols based on the channel model [7], the wireless channel, through which legitimate participants communicate, is used as a source of CR. An excellent overview of these systems is given in [16]. The achieved secret key generation speeds range from 0.037 b/s to 1800 b/s. Note that for the highest speed of 1.8 kb/s, the Key Disagreement Rate (KDR) is 8%. In addition, there is a consensus (see [17,18]) that the performance of this class of systems must assume very sharp restrictions on the attacker’s freedom, calling into question the reliability of the proclaimed security level of generated cryptographic secret keys.

In this paper, a new concept of a high-speed SKD system based on the CR contained in the speech signal of the protocol participants is presented. The novelties are the following:

To our knowledge, this is the first publicly available paper in which speech is used as CR in the class of SKD protocols based on the source model. As an independent system, our system can also be seen as a speech-controlled device that transforms voice inputs into secret keys. Within more complex protection systems, it can take on the role of an autonomous system of generating and distributing secret keys independently of the used telecommunication channel. Finally, as a part of a complete speech protection system, it provides complete autonomy and full voice control.
Maximizing CR utilization was achieved with a neural predictor designed to predict the attacker’s conditional Renyi entropy of order 2 (ECRE2) (collision entropy). In this way, the speed of the generated secret keys is adapted as a function of the side information available to the eavesdropper. Unlike the system from [13], stylometric features of binary strings were used before the Privacy Amplification (PA) stage, which does not require the exchange of additional information over a public channel. In this way, the final length of the generated keys is unavailable to the eavesdropper, which increases their equivocation.
The impact of random permutation of the signals of legitimate users was examined and experimentally evaluated. It is shown that this transformation significantly reduces the correlation of the eavesdropper signal with the signals of legitimate users, which practically eliminates the need for a compression block before applying PA.
PA block, based on machine learning, uses the effect of Spoiling information related to Renyi entropy [4]. This gives a more accurate estimate of the ECRE2 lower bound, enabling further increases in CR utilization and the speed of secret key generation.
On an evaluation set of 100 words “house” spoken by different speakers at a sampling rate of 16 kb/s, an average secret key generation speed of 28 kb/s was achieved, which makes this system widely applicable in different security scenarios and telecommunication channels with different transmission speeds.

The proposed system can be considered as an application of machine learning and deep learning techniques in Natural Language Processing Technologies (NLP). Namely, by transforming the binary string at the input to the PA block using the base 64 transformation, an equivalent new language is obtained, whose dynamics and correlation properties can be modeled more easily. Let us note that the question of chaoticity, hyper chaoticity [19], and exponential stability [20] is significant for connecting the desirable properties of secret cryptographic keys created in the domain of classical cryptography with the latest results in the domain of chaotic stochastic systems. If we impose an additional condition of self-synchronization on these systems, we approach a scenario very close to the concept of CR. On the other hand, we can see this work as a direct application of NLP technologies in the domain of SKD, ranging from the speech signal as a source of CR, feature engineering based on stylometric features, to deep neural network techniques that have been intensively developed within the framework of NLP.

The paper is structured in sections with the following content. In Section 2, we introduce the basic terminology and meta structure of the generic SKD for speech signals as a source of CR. We outline main building modules, including lossless compression block (LLC), feature engineering block (FE), and prediction interval deep neural network block (PIDNN) for the estimation of the lower bound of ECRE2.

The concept of Spoiling knowledge lower bound for ECRE2 is presented in Section 3. In Theorem 1, we prove that using this bound, in a PA block based on hash functions, from the universal class, gives secret keys of maximum uncertainty. Then, in Theorem 2, we further show that the appropriate PIDNN can be used to estimate the Spoiling knowledge lower bound for ECRE2. In this way, the practical application of this concept in the design of the SKD protocol is theoretically grounded. In Section 4, the empirical evaluation of the proposed SKD system is presented in detail on an audio dataset of spoken words designed to help train and evaluate keyword spotting. In Concluding Section 5, the practical importance of the proposed SKD system and the role of NLP technologies in its design are pointed out. The paper ends by stating some open design questions of this class of SKD systems.

2. Proposed Sequential Key Distillation Strategy for Speech Signals

According to the SKD for the source model [2], observations of some exogenous CR source are available to protocol participants, see Figure 1. The probabilistic characteristics of the source and the way the sequences are generated cannot be controlled by any protocol participants. In a probabilistic sense, we can describe it as a three-dimensional Discrete Memoryless Source (DMS) with a probabilistic structure

(X Y Z, P_{X Y Z}) .

Legitimate participants of the protocol, known as Alice and Bob, observe components

X

and

Y

, while the system attacker, known as Eve, has access to components

Z

. The protocol implies the existence of an accessible public authenticated channel through which Alice and Bob exchange messages in both directions. This channel is passively observed by Eve. In [2], it was shown that if this protocol is structured in four sequential stages: Randomness sharing, Advantage distillation (AD), Information reconciliation (IR), and Privacy amplification (PA), there is no loss of optimality. Namely, this sequential scheme provides all strong secret key rates below the secret key capacity of the given source of CR. The highest attainable secret key rate is defined as the secret key capacity of the given DMS and is equal to

C_{k} = m i n \{I (X; Y), I (X; Y | Z)\},

(1)

where

I (X; Y)

denotes mutual information between

X

and

Y

, while

I (X; Y | Z)

denotes corresponding conditional mutual information from Eve’s side when she possesses observation. If Eve does not have observations

Z

(no side information) or her observations are independent of

X

and

Y

, the maximum value of the secret key capacity is obtained

C_{k m a x} = I (X; Y) .

(2)

In Figure 2, we present the proposed DMS for which we used the speech signals of participants. We can also consider it as a virtual DMS, in which the selected phrase is spoken by three different subjects. The digitized speech signals correspond to the

X

,

Y

, and

Z

components of this virtual DMS. The correlation structure of the components comes from the anatomical determinism of human speech, as well as from the fact that the components originate from the same spoken phrase. The individuality of the articulatory apparatus of Alice, Bob, and Eve introduces the necessary degree of uncertainty into this DMS.

Figure 3 shows the basic architecture of the system. In addition to the usual AD, IR, and PA blocks, three additional blocks appear: LLC, in which lossless compression is performed with some selected algorithm of this class, FE—Feature engineering block, in which features are extracted from the sequence

S

in order to be input into the Machine Learning (ML) regression block.

The output of the ML block are predictions

\hat{u} (F)

of the Hamming distance between

S

and Eve’s wiretapped sequence

E = e

, as well as its lower and upper bound

L B_{u}

, and

U B_{u}

, respectively. Based on these estimates, the degree of compression of the PA block is determined and thus, the length of the generated secret key. The concept of adaptive determination of the degree of compression of the PA block based on machine learning was initially introduced in [13]. In this way, the speed of the generated secret keys is adapted to the side information available to the eavesdropper. In contrast to the system from [13], stylometric features of the sequence

S

were used, which do not require the exchange of additional information over a public channel. In this way, the final length of the generated keys is inaccessible to the eavesdropper, which increases their uncertainty. In addition, to estimate the lower bound of ECRE2, we used the Spoiling information effect related to this information measure. This has increased the efficiency of using CR and the speed of generating secret keys.

3. Machine Learning Subsystem

3.1. Spoiling Knowledge Lower Bound for ECRE2

Let

S

be the sequence shared by Alice and Bob immediately before applying PA, see Figure 3. Let

E

be Eve’s sequence obtained by eavesdropping on the sequence

S

via BSC with crossover probability

ε

. Let Eve, in addition to the specific eavesdropping sequence

E = e,

have additional side information in the form of a random variable

u = D_{H} (S, e)

that represents the Hamming distance between

S

and that particular value

e

.

Given

U = u

, all

(\begin{matrix} n \\ u \end{matrix})

sequences

s

at distance

u

from

e

are equally likely candidates for

S,

that is, ECRE2 is

R_{2} (S | U = u, E = e) = \log_{2} (\begin{matrix} n \\ u \end{matrix}) .

(3)

In [21], p. 309, Lemma 7 shows that

(\begin{matrix} n \\ λ u \end{matrix}) \geq \frac{2^{h (λ)}}{\sqrt{2 n}},

(4)

holds for all

λ \in (0, 1)

, where

h (x) = - x \cdot \log (x) - (1 - x) \cdot \log (1 - x), 0 < x < 1,

is the so-called binary entropy function. Let

L B_{u}

be a lower bound for

u

, such that

P r o b \{L B_{u} \leq u\} \geq 1 - δ,

(5)

holds for all

δ > 0

. Then

(\begin{matrix} n \\ u \end{matrix}) > (\begin{matrix} n \\ L B_{u} \end{matrix}) = (\begin{matrix} n \\ n \frac{L B_{u}}{n} \end{matrix}), with probability at least 1 - δ,

(6)

since

\frac{L B_{u}}{n} \in (0, 1)

. Taking into account (4), follows

(\begin{matrix} n \\ u \end{matrix}) \geq \frac{2^{n h (\frac{L B_{u}}{n})}}{\sqrt{2 n}}, with probability at least 1 - δ .

(7)

Substituting (7) into (13) we get the lower bound for ECRE2

R_{2} (S | U = u, E = e) \geq n h (\frac{L B_{u}}{n}) - \log_{2} \sqrt{2 n}, with probability at least 1 - δ .

(8)

Remark 1.

Let us note that the lower bound (8) is an explicit expression for the lower bound for ECRE2, which is used when proving Theorem 8 in [4] in the context of the analysis of the so-called Spoiling knowledge phenomenon. Therefore, we will call this lower bound Spoiling knowledge lower bound for ECRE2 and denote it with

R_{2 S p o i l} (L B_{u}, δ)

.

Theorem 1.

Denote by

S \in \{0, 1\}^{n}

the random variable that represents the sequence shared by Alice and Bob immediately before the PA phase of some SKD protocol. Let

E

denote a random variable that includes all of Eve’s knowledge about

S

. Let us denote by

e

one realization of

E

. In addition, as side information, Eve has a random variable U, jointly distributed with S and E according to some distribution

P_{U E S}

for which the marginal distribution of [E, S] coincides with

P_{E S}

. Let Alice and Bob form their shared secret key based on mapping K = G(S), where

G

is a hash function chosen uniformly at random from a universal class of hash functions

G : \{0, 1\}^{n} \to \{0, 1\}^{r}

. Then, the uncertainty of the generated secret key, viewed from Eve’s side, with probability at least

1 - δ

, satisfies the inequality:

H (K | G, E) \geq r - \sum_{U, E} P (u, e) \log_{2} (1 + 2^{r - R_{2 S p o i l} (L B_{u}, δ)}),

(9)

where

R_{2 S p o i l} (L B_{u}, δ) = n h (\frac{L B_{u}}{n}) - \log_{2} \sqrt{2 n} .

(10)

Proof of Theorem 1.

The proof immediately follows from Corollary 7 [4], and the fact that the lower bound

R_{2 S p o i l} (L B_{u}, δ)

holds with probability at least

1 - δ

. □

3.2. Interval Prediction Deep Neural Network for Predicting the Spoiling Knowledge Lower Bound for ECRE2

Knowing selected DMS in the actual operating conditions allows us to form training sets of the following structure:

\{S_{i}, e_{i}, u_{i} = D_{H} (S_{i}, e_{i})\}, i = 1, \dots, M,

(11)

where M represents the length of the training set.

Let us denote by

F_{i}

the set of features obtained from the sequence

S_{i}

. Then (11) is transformed into a training set

\{F_{i}, D_{H} (S_{i}, e_{i})\}, i = 1, 2, \dots, M,

(12)

which can be used to train a machine learning system for the prediction of

D_{H} (S_{i}, e_{i})

based on

F_{i}

. For the design of the PA block, it is necessary to determine the output dimension r of a universal family of hash functions

G : {\{0, 1\}}^{n} \to {\{0, 1\}}^{r}

. Theorem 1 shows that, for this purpose, we do not directly need the predication of

D_{H} (S_{i}, e_{i})

, but only of its lower bound, valid with probability at least

1 - δ

.

The ML block presented in Figure 4 gives three different outputs. Besides the true value of

u = D_{H} (S, e)

, it gives the values of the interval bound so that the true value is within that interval with a given high probability greater than

1 - α

. According to Theorem 1, we then use

L B_{u}

, of that interval as an estimate for the Spoiling knowledge lower bound for ECRE2. If

P r \{\hat{u} \leq L B_{u}\} = P r \{\hat{u} \geq U B_{u}\}

, it follows that

δ = \frac{α}{2}

.

Theorem 2.

Let the quantities S, E, G, K, and U be defined as in Theorem 1. and let

t \geq 0

be an arbitrary security parameter. If Alice and Bob choose the length k of the generated secret key K = G(S), for each S, i.e., F such that

k (F) = R_{2 S p o i l} (L B_{u} (F), δ) - t,

(13)

then the uncertainty of the generated secret key, viewed from Eve’s side, satisfies the inequality

H (K | G, E) \geq k (F) - \log_{2} (1 + 2^{- t}), w i t h p r o b a b i l i t y a t l e a s t 1 - δ .

(14)

Proof of Theorem 2.

According to Theorem 1, we have directly, after replacing (13) in (9)

\begin{matrix} H (K | G, E) \geq r - \sum_{U, E} P (u, e) \log_{2} (1 + 2^{r - R_{2 S p o i l} (L B_{u}, δ)}) = \\ k (F) - \sum_{U, E} P (u, e) \log_{2} (1 + 2^{- t}) = \\ k (F) - \log_{2} (1 + 2^{- t}), with probability at least 1 - δ . \end{matrix}

□

Remark 2.

Since the Shannon entropy H(K) of an arbitrary binary sequence K is upper bounded with H(K) ≤ |K| = k, from Theorem 2 immediately follows:

k (F) \geq H (K | G, E) \geq k (F) - \log_{2} (1 + 2^{- t}), w i t h p r o b a b i l i t y a t l e a s t 1 - δ .

(15)

Therefore, the distilled secret key K is of maximum uncertainty for Eve.

The functional requirements that are imposed on the ML block from Figure 4 are almost identical to the requirements placed on the designers of a class of regression systems in the field of machine learning. The outputs of these systems are not only the predicted values of the target variable but also the interval within which the true value is found with high probability. These systems are known as Prediction Intervals (PI) regression systems [22,23,24]. PI regression systems based on neural networks, especially deep neural networks, stand out in terms of accuracy. The criterion function that is minimized during the training process consists of two separate components: Coverage, and Mean PI width [25].

Coverage is given with

P I C P = \frac{1}{n} \sum_{i = 1}^{n} m_{i},

(16)

where n denotes the number of samples and

m_{i} = 1 i f u \in (L B_{u} (F_{i}), U B_{u} (F_{i}))

otherwise

m_{i} = 0

. We can also interpret it as prediction interval coverage probability (PICP), which is the reason for the acronym of this metric. It follows from the definition of this metric that it tends to the value

1 - α

, that is,

1 - 2 δ

.

The Mean PI width (MPIW) is defined by

M P I W = \frac{1}{n} \sum_{i = 1}^{n} U B_{u} (F_{i}) - L B_{u} (F_{i}) .

(17)

It aims to give the tightest upper and lower bounds of PI in the minimization process.

In order to prevent further reduction of PI during the optimization process, for data points that are outside PI, we introduce

M P I W_{c a p t}

, the “captured” version of MPIW:

M P I W_{c a p t} = \frac{1}{c} \sum_{i = 1}^{n} (U B_{u} (F_{i}) - L B_{u} (F_{i})) m_{i},

(18)

where

c = \sum_{i = 1}^{n} m_{i}

The training of PIDNN is reduced to the unconditional minimization of the joint criterion

J_{P I} = M P I W_{c a p t, θ} + λ Ψ (1 - α - P I C P_{θ}),

(19)

Ψ (x) = m a x {(0, x)}^{2},

(20)

where

λ

is the Lagrange multiplier. Penalty function Ψ gives more weight to larger deviations from the preset value 1 − α for the PICP criterion. During the empirical evaluation of the proposed SKD system, we used the software from the GitHub repository [26], which is described in the paper [24].

4. Experimental Evaluation

In order to experimentally evaluate the proposed SKD system, given in Figure 3, it is necessary to specify its individual blocks.

There is an extensive analysis of the efficiency of AD algorithms, see e.g., [6,27]. Our experience so far shows that the Bit Parity Advantage Distillation (BPAD) protocol [28,29] is efficient enough for our needs. An important property of this algorithm is the existence of Eve’s optimal strategy, which is a repetition of the same steps performed by both Alice and Bob. This allows a good estimate of the amount of information leaked to Eve during the AD phase, i.e., the security margin of the distilled secret keys.

The situation is similar in the case of the selection of IR algorithms, whose basic function is to correct all the differences between Alice’s and Bob’s sequences, obtained after the AD phase. The approach that ensures minimal communication over a public channel belongs to algorithms from the class of error correcting codes. We chose the Winnow IR protocol based on Hamming error correcting codes [30]. An important feature of this algorithm is also the existence of Eve’s optimal strategy proposed and analyzed in [27]. In this way, as in the case of the AD algorithm, we can give a reliable estimate of the amount of information leaked to Eve during the IR phase.

The PA block was implemented using random binary matrices of dimension

n \times r

, which have the Toeplitz structure. It is known that this class of hash functions belongs to the universal family of hash functions, see [31,32]. The Toeplitz hash functions are particularly suitable for speech signals, bearing in mind that typical values for

n

are of the order of

10^{4}

. Namely, the PA block can be efficiently implemented with a complexity of

O (n l o g n)

, instead of

O (n^{2})

, in the case of hash functions in the form of random binary matrices without the Toeplitz structure. Our implementation is based on modules that are part of the SciPy package [33].

The degree of compression of the PA block is determined by the dimension r of the Toeplitz hash matrix and is obtained based on the PIDNN output, according to expressions (10) and (13). Theorem 2 and Remark 2 guarantee that the secret keys thus generated are absolutely secret and have maximum uncertainty.

In subsequent analysis, we will consider two variants of the proposed SKD system: without LLC block (system A) and with LLC block (system B). In system B, LLC compression is achieved by Huffman’s optimal source coding over the source alphabet of size

2^{10}

[34]. More precisely, the systems consist of a chain of blocks:

\begin{matrix} System A : S p e e c h \to P e r m u t a t i o n \to B P A D \to W i n n o w \\ \to T o e p l i t z H a s h i n g \to S e c r e t k e y s \end{matrix}

\begin{matrix} System B : S p e e c h \to P e r m u t a t i o n \to B P A D \to W i n n o w \to H u f m a n n c o d i n g \\ \to T o e p l i t z H a s h i n g \to S e c r e t k e y s \end{matrix}

4.1. Voice Data Set

In the empirical evaluation, an audio dataset of spoken words is used [35,36]. The dataset consisted of 105,829 utterances obtained from 2618 different speakers who spoke 35 different words, with the sample data encoded as linear 16-bit single channel PCM values, at a 16 kHz rate [36]. Each spoken word lasts 1 s. A subset of 100 voice signals, corresponding to the pronunciation of the word “house”, was randomly selected from this set. Each of these signals can belong to Alice and Bob, giving a population of distinct

(X, Y)

pairs of size 100 × 99/2 = 4950, bearing in mind that swapping Alice and Bob in the SKD protocol does not affect the resulting secret keys. For each of the 4950 pairs, Eve’s signal Z is randomly selected from the set of signals from which Alice’s and Bob’s signals are excluded. In this way, 4950 realizations of DMS

(X Y Z, P_{X Y Z})

were obtained, where each of the sequences is 16,000 × 1 × 16 = 256,000 bits long.

During the design of SKD systems based on the voice source model, pre-processing and analog-to-digital conversion significantly determines the correlation properties of the resulting DMS

(X Y Z, P_{X Y Z})

and hence, the maximal secret key rate. When selecting a suitable value of the number of bits per sample, the dominant criterion should be the speed of the generated secret keys.

Following proposition 5.6., from [37], it is advisable to choose the simplest possible quantization procedure if no limitation is imposed on the transmission speed on a public channel. Since, in this paper, we do not consider speed limitations of data transmission over a public channel, we choose scalar uniform quantization.

The Shannon, or block entropy [38], is defined by

H_{n} = - \sum_{a_{1}, a_{2}, \dots, a_{n}} P (a_{1}, a_{2}, \dots, a_{n}) \log_{2} P (a_{1}, a_{2}, \dots, a_{n})

(21)

where

P (a_{1}, a_{2}, \dots, a_{n})

is the probability of occurrence of the pattern

a_{1} {, a}_{2} {, \dots, a}_{n}

. Figure 5 shows the dependence of the estimated block entropy rate on the number of bits per sample of the voice signal.

From the graph in Figure 5, it is evident that by increasing the number of bits per sample, the block entropy rate will initially increase and then decrease. It was noticed by many authors, see e.g., [37], that the secret key extraction rate may increase with over-quantization. Therefore, we decided to design a system operating with n_b = 16, i.e., in the over-quantization regime.

Figure 6 shows the histogram of normalized Hamming distances

D_{h}

of all 4950 distinct

(X, Y)

pairs. For two binary sequences,

X

and

Y

of the same length computation of this distance is defined by

D_{h} (X, Y) = \frac{D_{H} (X, Y)}{n u m b e r o f b i t s c o m p a r e d} = \frac{n u m b e r o f n o n - m a t c h b i t s}{n u m b e r o f b i t s c o m p a r e d}

(22)

Note that

D_{h} (X, Y)

has the value 0 if these sequences are identical, and the value 0.5 if they are mutually independent.

Normalized Hamming distances of all pairs (Alice, Bob), as well as distances (Alice, Eve), are shown in Figure 7. Let us recall that Eve was chosen from the pool of remaining participants in a random manner. In terms of evaluating the security aspects of this protocol, this choice of Eve corresponds to an insider attack, which we can consider as the worst case from the point of view of legitimate participants. Therefore, the performed analysis and security parameters are reliable indicators of the practical security of the entire system.

4.2. Decorrelation of Eavesdroppers and Legitimate Users

The resulting secret key rate will be much higher when Eve’s sequence is less correlated with Alice’s and Bob’s sequences. Therefore, if Alice and Bob apply the same randomly chosen permutation to their sequences, their normalized Hamming distance will remain unchanged, while the mathematical expectation of the distance of Eve’s sequence to these permuted sequences will be 0.5. In other words, if the legitimate participants of the protocol apply the same random permutation, about which Eve has no information, there will be a complete decorrelation between her sequence with the sequences of Alice and Bob. This situation is shown in Figure 8.

The advantage of decorrelation is observed at all phases of SKD. The evolution of normalized Hamming distances in the first and the second iteration of the applied BPAD is presented in Figure 9 and Figure 10. Namely, while the normalized Hamming distances to Eve remain 0.5, the distances of the sequences between Alice and Bob decrease rapidly. As seen in Figure 10, the expected value of this distance after the second iteration of the BPAD algorithm is about 0.01, which enables a very efficient application of the Winnow protocol in the IR phase [27]. Figure 11 shows that, after applying the Winnow protocol, all pairs of (Alice, Bob) sequences are matched (the normalized Hamming distance is equal to 0), while Eve remained at the distance of 0.5.

To generate permutation, an initial short secret key previously exchanged between Alice and Bob is necessary. Since the basic SKD protocol is preceded by the authentication of the public channel, that is, the authentication of legitimate users, this initial secret key can be locally generated based on the cryptographic keys used in the authentication phase. Another possibility is the synthesis of an autonomous SKD system that does not depend on the stage of user authentication. First, a short secret key is distilled over the original (non-permuted) signals with a lower key rate, followed by the main distillation phase with permuted signals.

4.3. Feature Engineering Based on Information Theoretic Measures and Stylometry

In [13], it was shown that information-theoretic features give great results for predicting the ECRE2 lower bound in the case of EEG sources. In this paper, we used all features from [13] except features 10 and 11. To those features, we added new information-theoretic features consisting of normalized block entropy of block sizes 2, 14, and 20 calculated on the agreed sequence before PA. Let us denote all information-theoretic features as IT_1-IT_14. Features IT_4, IT_5, and IT_6 require information exchange over a public channel between Alice and Bob. The question arises whether it is possible to find a set of features that can be calculated locally by Alice and Bob, which do not require communication over a public channel. In order to explore that possibility, a set of stylometric features, described in detail in [39], was added except for the last feature. We will denote them as ST_1-ST_22. These features were obtained by previously transforming the sequence

S

using a Base64 encoder into “sentences” of the language over the alphabet of size 64 [39].

The ranking of features from the combined set of these features was performed by calculating their SHAP (SHapley Additive exPlanations) values [40,41]. Shapley’s values have their origins in cooperative game theory. Due to their theoretical foundation and practical usability, Shapley values are becoming a dominant method used in machine learning for evaluation and ordering of feature importance.

The architecture of the used PIDNN is shown in Figure 12 according to the notation adopted in the Keras API [42]. The first layer is a dense layer of dimension 128. After Batch normalization, follows the second dense layer of dimension 64 and finally the third dense layer of dimension 2 × 64, with 2 outputs on the first block (upper bound, lower bound) and one output on the second block (predicted value).

In the feature engineering phase, the input to the neural network is dimension 36, which is the total dimension of the combined information-theoretic and stylometric features (14 + 22 = 36). After selecting the 11 most informative features, the final PIDNN is of the same architecture, except that the input dimension is 11.

The calculation of SHAP values was performed within 10-fold cross-validation, which means that the displayed values are equal to the average values of all 10 folds. Figure 13 and Figure 14 show the first 11 ranked features for systems A and B, respectively. We note that the synergy of information-theoretic and stylometric features for both systems resulted in a set of 11 most informative features, which do not require additional information exchange via a public channel. Namely, there are no features IT_4, IT_5, and IT_6 in those sets. This property is important not only in terms of security, but also allows an independent local synthesis of PIDNN blocks on the side of legitimate users.

It is interesting to point out the relative importance of information-theoretic and stylometric features. For this purpose, we can define indicators

S i g_{I T} = \frac{\sum_{i = 1}^{11} S h a p (I T_{i})}{\sum_{i = 1}^{11} [S h a p (I T_{i}) + S h a p (S T_{i})]} \times 100 [%],

(23)

S i g_{S T} = \frac{\sum_{i = 1}^{11} S h a p (S T_{i})}{\sum_{i = 1}^{11} [S h a p (I T_{i}) + S h a p (S T_{i})]} \times 100 [%] .

(24)

They give the relative contribution of information-theoretic and stylometric features in percentage, respectively. For System A, we obtain values

S i g_{I T} = 62.4 %, S i g_{S T} = 37.6 %

, while for system B,

S i g_{I T} = 69.4 %, S i g_{S T} = 30.6 %

. The greater influence of stylometric features in System A can be explained by the fact that these features better describe unmodeled serial correlations of CR sources, which are more present in this system. The influence of stylometric features decreases in system B because these unmodeled serial correlations of CR sources are removed by Huffman coding.

4.4. Performance Measures

In the experimental evaluation, we tested three different PA strategies:

The optimal strategy

$k_{O p t} (e) = R_{2} (S | E = e) - n_{e} - t$

(25)
Global minimum strategy

$\begin{matrix} k_{G L B_R 2 m i n} (e) = c - n_{e} - t, \\ R_{2} (S | E = e) \geq c, \forall e \end{matrix}$

(26)
ML strategy based on PIDNN and Spoiling knowledge lower bound for ECRE2

$k_{M L} (F) = R_{2 S p o i l} (L B_{u} (F), δ) - n_{e} - t,$

(27)

where

n_{e}

is the number of bits exactly eavesdropped by Eve in the phases of the protocol before sequence compression, while

t

denotes the security parameter

t > 0

. Since the BPAD and Winnow algorithm for Eve’s optimal strategy does not provide Eve with this type of information,

n_{e} = 0

, for the parameter t, we adopted

t = 0

so that different PA strategies can be consistently compared.

To compare different PA strategies, two indicators, Gain and Loss, were introduced in [13]. If we consider two PA strategies, A and B, Gain is defined as follows

G_{B}^{A} = \frac{|K_{A}|}{|K_{B}|},

(28)

where

|K_{A}|

and

|K_{B}|

denote the lengths of secret keys generated using the same input sequences

S

with strategies A and B. Loss of PA strategy A is given by

L o s s_{A} = \frac{|K_{O p t} - K_{A}|}{K_{O p t}} \times 100 [%],

(29)

where

K_{O p t}

is secret keys generated by PA strategy (25), for the same input sequences

S

. Therefore,

L o s s_{A}

can be interpreted as a measure of the unused randomness of a given CR source under PA strategy A.

We also need to define

R 2

and

R_{2 m i n}

given by the following expression

R 2 = \frac{1}{M} \sum_{i = 1}^{M} R_{2} (S_{i} | E = e_{i}),

(30)

R_{2 m i n} = \underset{i}{m i n} R_{2} (S_{i} | E = e_{i}) .

(31)

which represents the mean and the minimum value of ECRE2 of the given M realization of the corresponding DMS.

Since Eve’s eavesdropper channel is BSC, it follows that

R_{2} (S_{i} | E = e_{i}) = - \log_{2} (D_{h i}^{2} + {(1 - D_{h i}^{2})}^{2}),

(32)

where is

D_{h i}

the normalized Hamming distance between the sequences

S_{i}

and

e_{i}

.

The mean value of the spoiled ECRE2 lower bound

L B_{u}

, obtained from PIDNN is denoted by

R 2_{M L} = \frac{1}{M} \sum_{i = 1}^{M} R_{2 S p o i l} (L B_{u} (F_{i}), δ) .

(33)

The quantity

G_{G L B_R 2 m i n}^{O p t} = \frac{R 2}{R_{2 m i n}},

(34)

represents the potential gain of the secret key lengths when applying the optimal PA strategy (25) in comparison to the strategy based on the global minimum (26). Similarly,

G_{G L B_R 2 m i n}^{M L} = \frac{R 2_{M L}}{R_{2 m i n}},

(35)

is the gain of the secret key lengths when applying the two PA strategies, one based on machine learning (27) and the other representing global minimum strategy (26). Corresponding losses

L o s s_{G L B_R 2 m i n} = \frac{|R 2 - R_{2 m i n}|}{R 2} \times 100 [%],

(36)

L o s s_{M L} = \frac{|R 2 - R 2_{M L}|}{R 2} \times 100 [%],

(37)

express the degree of unused of the given DMS, conditioned with chosen PA strategy.

The key rate and the key acceptance rate are given by

K R = \frac{t o t a l l e n g t h o f e s t a b l i s h e d k e y s}{t o t a l l e n g t h o f i n p u t s e q u e n c e s} \times 100 [%],

(38)

K A R = \frac{n u m b e r o f f i n a l k e y s w i t h l e n g t h > 0}{t o t a l n u m b e r o f k e y s} \times 100 [%] .

(39)

The leakage rate quantifies the amount of Eve’s information contained in her wiretapped sequence about Alice and Bob’s distilled common key, scaled to one bit

L R = I (X; Z) = 1 - h (D_{h} (A, E)),

(40)

since Eve’s eavesdropper channel is BSC, see Figure 3.

Table 1 summarizes the flow chart of the experimental evaluation. The obtained results are presented in Table 2.

To check for the randomness of the obtained sequences from systems A and B, we performed tests developed by the US National Institute of Standards and Technology, NIST [43]. The randomness tests are based on the statistical test suite, and the results are shown in Table 3. The result of each test is given by the p-value. The requirement for passing an individual test is that the obtained p-value is higher than 0.01. The results confirm that the generated secret keys of both systems have passed the test, that is, they meet the defined randomness criteria.

Based on the results presented in Table 2 and Table 3, we can draw the following conclusions:

In terms of the most important indicators, such as KR, KAR, $L o s s_{M L}$ , and LR, systems A and B show very similar performances. From here follows that the dominant impact on system performance stems from input decorrelating permutation, not from Huffman source coding. The conclusion that the decisive influence on the performance of the system does not have Huffman coding but the input decorrelating permutation.
The obtained KR of 11% is, according to our insight, the highest publicly published KR for systems of this class. If it is considered that the speed of DMS based on speech signal is 256 kb/s, (16 bits per sample and sampling rate of 16 kHz), the speed of generating absolutely secret keys is about 28 kb/s.
The resulting LR is of the order of $3 \times 10^{- 5}$ bits per one bit of the generated secret key. This means that Eve gets a negligible 0.85 b for every 28 kb of generated secret key. This amount of leaked information is scattered throughout Eve’s wiretapped sequence, which makes it almost impossible to reconstruct any bit of the distilled secret keys.
In the available literature, there are no works that explicitly state the LR of the proposed systems, except for works [12,13]. They reported LR values from $0.3 \times 10^{- 3}$ to $0.6 \times 10^{- 3}$ for DMS related to EEG signals. This is more than one order of magnitude worse result. Since the systems from [12,13] do not contain the initial decorrelating permutation, it is reasonable to conclude that its important influence on the performance of an SKD system is expressed not only in the increase of KR but also in the significant decrease of LR.
The value $L o s s_{M L} = 0.16 \pm 0.01$ % indicates that the proposed system makes maximal use of the randomness of the given CR source. It is also a confirmation that the concept of adaptive PA based on PIDNN and Spoiling knowledge lower bound of ECRE2 is practically optimal. Let us emphasize that in the optimal case, the value for the loss is $L o s s_{M L} = 0$ .
High values of quantities $G_{G L B_R 2 m i n}^{O p t}$ and $G_{G L B_R 2 m i n}^{M L}$ (between 65 and 410) indicate the superiority of the PA block design based on the estimate of the lower bound of ECRE2, compared to classical approaches based on the global minimum.
The NIST test confirms the randomness of distilled secret keys. With negligible LR, the proposed system can be used to generate and distribute secret cryptological keys in systems that provide absolute secrecy in the Shannon sense.

Based on all presented results, the proposed SKD system enables the design of fully autonomous systems for voice protection, see Figure 15. Green border lines indicate a classic system, while blue border lines indicate a fully autonomous system. Autonomy is understood here both in terms of independence from any additional system for generating and distributing secret keys as well as in terms of complete voice control of the system. The hands-free feature of the speech protection system is of exceptional practical importance, both in military and civilian applications.

5. Security Issues and Application

5.1. Limitation of the Proposed Methodology

The presented methodology of SKD system design is applicable when we have sufficiently representative samples of the selected DMS. Although at first glance this seems to be limiting, it is not so. If the dominant criterion is the minimization of security breaches of the entire protection system, of which the SKD system is just one part, it is clear who are the good choices for the legitimate users of the system. Based on this set of legitimate users, a sufficiently representative sample of the selected DMS is available at the beginning of the system design phase. The decorrelation permutation at the system input prevents any initial advantage for the attacker, which gives an additional security margin even in the case of an insufficiently large sample of legitimate users. Finally, Huffman’s source compression coding significantly reduces information leakage toward Eve. It is interesting to note that even in the case of an insider attack, i.e., assuming the role of Eve by one of the legitimate users, the decorrelation effect of the initial permutation has the same impact as if it were an illegitimate user. The variance of the ML regression estimator of the ECRE2 lower bound can always be compensated by increasing the security parameter t. Our experimental results show that, for the speech signal, this parameter is of the order of several tens of bits, which slightly affects the reduction of the final KR of the generated secret keys. Other classes of security threats include the violation of the authenticity of the public communication channel and the change of Eve’s strategy from a passive observer to an active attacker. It was shown in [44] that the problem of an active attacker can be solved by doubling the security parameter t, which in our system remains in the domain of several tens of bits. Furthermore, using standard cryptographic mechanisms for authentication and the integrity of messages exchanged over a public channel, the threat from this class of attack is reduced to a minimum. Therefore, we can conclude that the performance of the proposed SKD system will remain practically unchanged even in this worst scenario.

5.2. Robustness to Different Types of Speech Input

The used speech base is originally intended to help train and evaluate keyword spotting systems. This speech base covers variations in a wide range of manner and speed of pronunciation: accented speech, different emotional states of speakers, gender, positions of start and end of words, level of background noise, etc. We tried to ensure that our sample of 100 speakers validly represented all these variations.

Experimental results show that the proposed system is extremely robust to all pronunciation and input speech quality variations. This can be explained by the fact that the digitization and serialization of speech samples, followed by permutation, give a sufficiently good approximation of the Bernoulli (0.5) distributed binary sequence of legitimate users. In addition, since the permutation does not destroy the cross-correlation properties of the sequences of legitimate users, it proves to be sufficient to achieve high-speed KR.

5.3. Generalization to Different Types of Encryption Protocols and Cryptographic Systems

The proposed SKD system can be:

stand-alone module for generating and distributing cryptographic secret keys,
integrated into some of the existing cryptographic protocols and systems.

In the first case, independence from the main communication channel is a particular advantage since the system uses CR based on the source model. In addition, all system segments can operate in the offline mode, from the formation of binary strings of protocol participants to the final PA phase. The offline mode of operation also enables more effective procedures for additional system protection from compromising electromagnetic emanation and side channel attacks of various types and origins. In this mode of operation, the system can be used for critical information infrastructure protection. In this mode of operation, the system can be a supplier of absolutely secret cryptographic keys to those elements of critical information infrastructure, which use cryptographic protection mechanisms based on cryptographic keys of the highest quality and secrecy level.

In the second case, integration with existing cryptographic protocols and systems is more a question of the achieved functionality level after integration than the possibility of applying this system. For example, the proposed SKD system can replace the Diffie Helman protocol for establishing symmetric keys in all systems and protocols in which it is incorporated (IPSec, SSL, etc.). Whether this integration increases the system functionality is a separate question related to the purpose of the integrated system and the conditions in which it will be working. In this scenario, the system works in the online mode. That entails additional requirements regarding the availability and quality of the public channel. The problem of the public channel with transmission errors can be solved by adding error-correcting codes in both the AD and IR phases of information exchange over the public channel. This addition only increases the complexity of the proposed SKD but does not compromise its most important performances, such as KR and LR.

5.4. Computational Complexity

The computational complexity of the AD and IR phases of the proposed SKD is of order

O (n)

, while the complexity of Huffman source compression and Toeplitz hash function is of order

O (n \cdot l o g n),

where n is the length of input signals. Therefore, overall computational complexity is of order

O (n \cdot l o g n

). This indicates that the implementation of the proposed algorithm does not require special memory and computational resources. In addition, if the system works only in the offline mode, there is an even more reduction in required resources.

5.5. Summary of the Main Challenges

The main challenges of this work can be listed as follows:

Showing that speech can be a source of CR for sufficiently efficient SKD systems.
Maximizing CR mainly through weakening the attacker’s position mechanisms (decorrelation of Eve’s signals) and minimizing the measure of unused for chosen CR. We solved this challenge with an adaptive determination of the maximal length of the secret keys mechanism for a given level of leakage to Eve. The proposed solution consists of a PIDNN ML block, which in working-mode functions only based on local information available to protocol participants and does not require any information exchange over a public channel.
Reliable measurement of LR. This requirement entails two additional ones: (a) proof that Eve’s strategy is optimal, (b) that Eve’s knowledge of the generated secret keys is provably bounded with a predetermined confidence level. Eve’s optimal strategy is to repeat the moves of the legitimate participants in the AD phase, which follows from the fact that the considered DMS is equivalent to Mauer’s satellite scenario. Theorems 1 and 2 and the implemented adaptive PA block ensure the fulfillment of requirement (b).

6. Conclusions

This paper presents a successful application of NLP technologies (speech processing, machine learning, and stylometry) in the synthesis of high-efficiency SKD protocols. The developed PA strategy, based on PIDNN in order to estimate the lower bound of ECRE2, enables the maximum use of the randomness contained in the selected CR source. At the same time, this approach enables the precise quantification of leaked information to an eavesdropper. By introducing a decorrelating permutation in the proposed SKD system at the very beginning of the information processing chain, a significant increase in the key rate and a decrease in leaked information were obtained (both improvements at the level of an order of magnitude). All these results represent a solid basis for the practical implementation of a wide class of fully autonomous speech protection systems for both military and civilian applications.

An open question that has been waiting to be answered since Maurer’s pioneering work is the practical characterization of CR sources, which would enable the maximization of the key performances of SKD systems based on these sources. Moreover, our future research will be focused on finding other CR generation mechanisms that would enable the synthesis of efficient secret key distillation systems. In this respect, in our opinion, all stochastic self-synchronizing structures are interesting, such as those presented in the paper [45], as well as some classes of neural networks that enable synchronous distributed training, such as Siamese neural networks [46].

Author Contributions

Conceptualization, J.R. and M.M.; methodology, M.M. and Z.B.; software, J.R.; validation, J.R., M.M., Z.B. and M.J.; formal analysis, M.M.; investigation, J.R.; resources, M.J.; data curation, J.R.; writing—original draft preparation, J.R.; writing—review and editing, M.J.; visualization, J.R.; supervision, M.M. and Z.B.; project administration, J.R. and M.J.; funding acquisition, Z.B. All authors have read and agreed to the published version of the manuscript.

Funding

The research is funded by the Vlatacom Institute of High Technologies under project #164 EEG_Keys.

Data Availability Statement

Voice data can be downloaded from [35].

Conflicts of Interest

The authors declare no conflict of interest.

References

Ahlswede, R.; Csiszar, I. Common randomness in information theory and cryptography. Part I: Secret sharing. IEEE Trans. Inf. Theory 1993, 39, 1121–1132. [Google Scholar] [CrossRef]
Maurer, U.M. Secret key agreement by public discussion from common information. IEEE Trans. Inf. Theory 1993, 39, 733–742. [Google Scholar] [CrossRef]
Csiszar, I.; Narayan, P. Secrecy Capacities for Multiple Terminals. IEEE Trans. Inf. Theory 2004, 50, 3047–3061. [Google Scholar] [CrossRef]
Bennett, C.H.; Brassard, G.; Crepeau, C.; Maurer, U.M. Generalized privacy amplification. IEEE Trans. Inf. Theory 1995, 41, 1915–1923. [Google Scholar] [CrossRef]
Bloch, M.; Günlü, O.; Yener, A.; Oggier, F.; Poor, H.V.; Sankar, L.; Schaefer, R.F. An overview of information-theoretic security and privacy: Metrics, limits and applications. IEEE J. Sel. Areas Inf. Theory 2021, 2, 5–22. [Google Scholar] [CrossRef]
Jost, D.; Maurer, U.; Ribeiro, J.L. Information-theoretic secret-key agreement: The asymptotically tight relation between the secret-key rate and the channel quality ratio. In Theory of Cryptography; TCC 2018, LNCS; Beimel, A., Dziembowski, S., Eds.; Springer: Cham, Switzerland, 2018; Volume 11239, pp. 345–369. [Google Scholar]
Bloch, M.; Barros, J. Physical-Layer Security: From Information Theory to Security Engineering; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar]
Xu, W.; Revadigar, G.; Luo, C.; Bergmann, N.; Hu, W. Walkie-talkie: Motion-assisted automatic key generation for secure on-body device communication. In Proceedings of the 15th ACM/IEEE International Conference on Information Processing in Sensor Networks, Vienna, Austria, 11–14 April 2016. [Google Scholar]
Xu, W.; Javali, C.; Revadigar, G.; Luo, C.; Bergmann, N.; Hu, W. Gait-key: A gait-based shared secret key generation protocol for wearable devices. ACM Trans. Sens. Netw. 2017, 13, 1–27. [Google Scholar] [CrossRef]
Guglielmi, A.V.; Muraro, A.; Cisotto, G.; Laurenti, N. Information theoretic key agreement protocol based on ECG signals. In Proceedings of the IEEE Global Communications Conference (GLOBECOM), Madrid, Spain, 7–11 December 2021; pp. 1–6. [Google Scholar]
Milosavljević, M.; Adamović, S.; Jevremović, A. Secret keys generation from mouse and eye tracking signals. In Proceedings of the 6th International Conference on Electrical, Electronic and Computing Engineering—IcETRAN 2019, Silver Lake, Serbia, 3–6 June 2019. [Google Scholar]
Galis, M.; Milosavljević, M.; Jevremović, A.; Banjac, Z.; Makarov, A.; Radomirović, J. Secret-key agreement by asynchronous EEG over authenticated public channels. Entropy 2021, 23, 1327. [Google Scholar] [CrossRef]
Radomirović, J.; Milosavljević, M.; Kovačević, B.; Jovanović, M. Privacy amplification strategies in sequential secret key distillation protocols based on machine learning. Symmetry 2022, 14, 2028. [Google Scholar] [CrossRef]
Gungor, O.; Chen, F.; Koksal, C.E. Secret key generation from mobility. In Proceedings of the 2011 IEEE GLOBECOM Workshop, Houston, YX, USA, 5–9 December 2011. [Google Scholar]
Gungor, O. Information Theory Enabled Secure Wireless Communication, Key Generation and Authentication. Ph.D. Thesis, The Ohio State University, Columbus, OH, USA, 2014. [Google Scholar]
Zhang, J.; Duong, T.Q.; Marshall, A.; Woods, R. Key generation from wireless channels: A review. IEEE Access 2016, 4, 614–626. [Google Scholar] [CrossRef]
Pierrot, A.J.; Chou, R.A.; Bloch, M.R. The effect of eavesdropper’s statistics in experimental wireless secret-key generation. arXiv 2013, arXiv:1312.3304. [Google Scholar]
Mitev, M.; Pham, T.M.; Chorti, A.; Barreto, A.N.; Fettweis, G. Physical layer security--from theory to practice. arXiv 2022, arXiv:2210.13261. [Google Scholar]
Li, K.; Li, R.; Cao, L.; Feng, Y.; Onasanya, B.O. Periodically intermittent control of memristor-based hyper-chaotic bao-like system. Mathematics 2023, 11, 1264. [Google Scholar] [CrossRef]
Zhao, Y.; Wang, L. Practical exponential stability of impulsive stochastic food chain system with time-varying delays. Mathematics 2023, 11, 147. [Google Scholar] [CrossRef]
MacWilliams, F.J.; Sloane, N.J.A. The Theory of Error-Correcting Codes, 1st ed.; North Holand: New York, NY, USA, 1977. [Google Scholar]
Keren, G.; Cummins, N.; Schuller, B. Calibrated prediction intervals for neural network regressors. IEEE Access 2018, 6, 54033–54041. [Google Scholar] [CrossRef]
Kivaranovic, D.; Johnson, K.; Leeb, H. Adaptive, distribution/free prediction intervals for deep networks. In Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS), Palermo, Italy, 26–28 August 2020. [Google Scholar]
Simhayev, E.; Katz, G.; Rokach, L. PIVEN: A deep neural network for prediction intervals with specific value prediction. arXiv 2020, arXiv:2006.05139. [Google Scholar]
Khosravi, A.; Nahavandi, S.; Creighton, D.; Atiya, A.F. Lower upper bound estimation method for construction of neural network-based prediction intervals. IEEE Trans. Neural Netw. 2010, 22, 337–346. [Google Scholar] [CrossRef]
Elisim/PIVEN-GitHub. Available online: https://github.com/elisim/piven (accessed on 6 August 2022).
Wang, Q.; Wang, X.; Lv, Q.; Ye, X.; Luo, Y.; You, L. Analysis of the information theoretically secret key agreement by public discussion. Secur. Commun. Netw. 2015, 8, 2507–2523. [Google Scholar] [CrossRef]
Maurer, U.M. Protocols for secret key agreement by public discussion based on common information. In Proceedings of the 12th Annual International Cryptology Conference on Advances in Cryptology—CRYPTO 1992, Santa Barbara, CA, USA, 16–20 August 1992. [Google Scholar]
Gander, M.J.; Maurer, U.M. On the secret key rate of binary random variables. In Proceedings of the 1994 International Symposium on Information Theory and Its Applications, Sydney, Australia, 20–24 November 1994. [Google Scholar]
Buttler, W.T.; Lamoreaux, S.K.; Torgerson, J.R.; Nickel, G.H.; Donahue, C.H.; Peterson, C.G. Fast, efficient error reconciliation for quantum cryptography. Phys. Rev. A 2003, 67, 052303. [Google Scholar] [CrossRef]
Tsurumaru, T.; Hayashi, M. Dual universality of hash functions and its applications to quantum cryptography. IEEE Trans. Inf. Theory 2013, 59, 4700–4717. [Google Scholar] [CrossRef]
Hayashi, M.; Tsurumaru, T. More efficient privacy amplification with less random seeds via dual universal hash function. IEEE Trans. Inf. Theory 2016, 62, 2213–2232. [Google Scholar] [CrossRef]
SciPy. Available online: https://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.matmul_toeplitz.html (accessed on 11 November 2022).
Huffman, D.A. A method for the construction of minimum-redundancy codes. Proc. IRE 1952, 40, 1098–1101. [Google Scholar] [CrossRef]
Kaggle. Available online: https://www.kaggle.com/datasets/mok0na/speech-commands-v002 (accessed on 6 August 2022).
Warden, P. Speech commands: A dataset for limited-vocabulary speech recognition. arXiv 2018, arXiv:1804.03209. [Google Scholar]
Watanabe, S.; Oohama, Y. Secret key agreement from vector Gaussian sources by ate limited public communication. IEEE Trans. Inf. Forensics Secur. 2011, 6, 541–550. [Google Scholar] [CrossRef]
Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef]
Adamović, S.; Miškovic, V.; Maček, N.; Milosavljević, M.; Šarac, M.; Saračević, M.; Gnjatović, M. An efficient novel approach for iris recognition based on stylometric features and machine learning techniques. Future Gener. Comput. Syst. 2020, 107, 144–157. [Google Scholar] [CrossRef]
Lundberg, S.; Lee, S. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems; Curran Associates Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
Shap. Available online: https://shap.readthedocs.io/en/latest/ (accessed on 1 November 2022).
Keras API. Available online: https://keras.io/api/utils/model_plotting_utils/#plot_model-function (accessed on 1 October 2022).
NIST. A Statistical Test Suite for Random and Pseudorandom Number Generators for Cryptographic Applications; National Institute of Standards and Technology (NIST), Technology Administration, U.S. Department of Commerce: Gaithersburg, MD, USA, 2010. Available online: https://csrc.nist.gov/publications/detail/sp/800-22/rev-1a/final (accessed on 6 October 2022).
Maurer, U.M.; Wolf, S. Secret-key agreement over unauthenticated public channels—Part III: Privacy amplification. IEEE Trans. Inf. Theory 2003, 49, 839–851. [Google Scholar] [CrossRef]
Vadivel, R.; Hammachukiattikul, P.; Zhu, Q.; Gunasekaran, N. Event-triggered synchronization for stochastic delayed neural networks: Passivity and passification case. Asian J. Control 2022, 1–18. [Google Scholar] [CrossRef]
Chicco, D. Siamese neural networks: An overview. In Artificial Neural Networks, Methods in Molecular Biology; Humana Press: New York, NY, USA, 2020; Volume 2190, pp. 73–94. [Google Scholar]

Figure 1. Source model of the secret-key agreement via public discussion from common randomness.

Figure 2. Secret-key distillation using public channel based on the speech signals obtained by pronouncing the chosen phrase.

Figure 3. The basic architecture of the proposed system.

Figure 4. Prediction interval ML regression for estimate

L B_{u}

—Spoiling knowledge lower bound for ECRE2.

Figure 4. Prediction interval ML regression for estimate

L B_{u}

—Spoiling knowledge lower bound for ECRE2.

Figure 5. Relationship between block entropy rate and the number of bits

n_{b}

per sample of the voice signal for the range of block values from 1 to 20. Different block length values are represented by distinct curves.

Figure 5. Relationship between block entropy rate and the number of bits

n_{b}

per sample of the voice signal for the range of block values from 1 to 20. Different block length values are represented by distinct curves.

Figure 6. Histogram of normalized Hamming distances of all 4950 pairs of uttered words. Mean and dispersion of normalized Hamming distances (0.2111 ± 0.0511).

Figure 7. Histogram of normalized Hamming distances between Alice and Bob (0.2111 ± 0.0511), and Alice and Eve (0.2109 ± 0.0503), at the beginning of the algorithm. For all different (Alice, Bob) pairs, Eve is selected randomly from the rest participants.

Figure 8. Histogram of normalized Hamming distances between Alice and Bob (0.2111 ± 0.0511), and Alice and Eve (0.4999 ± 0.0010), after applying a random permutation to Alice’s and Bob’s sequence.

Figure 9. Histogram of normalized Hamming distances between Alice and Bob (0.0736 ± 0.0442), and Alice and Eve (0.5000 ± 0.0017), after the first iteration of the BPAD algorithm.

Figure 10. Histogram of normalized Hamming distances between Alice and Bob (0.0096 ± 0.0169), and Alice and Eve (0.5000 ± 0.0026), after the second iteration of the BPAD algorithm.

Figure 11. Histogram of normalized Hamming distances between Alice and Bob (0.0 ± 0.0), and Alice and Eve (0.4999 ± 0.0031), after the IR phase realized by the Winnow algorithm.

Figure 12. PIDNN architecture for SKD obtained from the Keras API. The same architecture is used both in the phase of selecting the most informative features and in the working mode. The only difference is in the number of inputs: 36 in the feature engineering phase and 11 in the working mode.

Figure 13. Feature importance at the input to PIDNN, ranked by SHAP values, for SKD system without LLC block (System A).

Figure 14. Feature importance at the input to PIDNN, ranked by SHAP values, for SKD system with LLC block based on optimal Huffman code over source alphabet of size

2^{10}

(system B).

Figure 14. Feature importance at the input to PIDNN, ranked by SHAP values, for SKD system with LLC block based on optimal Huffman code over source alphabet of size

2^{10}

(system B).

Figure 15. Concept of fully autonomous voice protection system. Green border lines indicate a classic system, while blue border lines indicate a fully autonomous system.

Table 1. Flow chart of experimental evaluation.

For a given speech DMS generate a representative population

(X_{i}, Y_{i}, Z_{i})

,

|X_{i}| = |Y_{i}| = |Z_{i}| = n_{i}, i = 1, 2, \dots, M

Generate a set

\{S_{i}, e_{i}, u_{i} = D_{H} (S_{i}, e_{i})\}, i = 1, \dots, M

.

Generate a set

\{F_{i}, D_{H} (S_{i}, e_{i})\}, i = 1, 2, \dots, M

.

Start 10-fold cross-validation

Train PIDNN for predicting

D_{H} (S_{i}, e_{i})

and its lower and upper bound

\{U B_{u} (F_{i}), L B_{u} (F_{i})\}

on the training part of the current fold

Calculate the Spoiling knowledge lower bound on the test part of the current fold

R_{2 S p o i l} (L B_{u} (F_{i}), δ) = n_{i} h (\frac{L B_{u} (F_{i})}{n_{i}}) - \log_{2} \sqrt{2 n_{i}}

.

Apply the Global minimum (26) and ML (27) PA strategy on all sequences in the test part of the current fold. The ECRE2 minimum is found on the training part of the current fold. ECRE2 values are calculated by the expression

R_{2} (S_{i} | E = e_{i}) = - \log_{2} (D_{h i}^{2} + {(1 - D_{h i}^{2})}^{2})

.

Calculation of all indicators (34)–(40) on the current fold.

End of cross-validation

Calculation of mean values and variances of all required indicators (30)–(40)

Table 2. Experimental results for two basic systems.

	A (Win-Hash)	B (Win-Huff-Hash)
$R_{2 S p o i l} > 0$ [%]	100 ± 0.00	99.96 ± 0.09
PICP	0.9994 ± 0.0009	0.9996 ± 0.0008
MPIW	0.047 ± 0.009	0.048 ± 0.021
$R 2$	28,944.50 ± 351.67	28,238.13 ± 354.03
$R_{2 m i n}$	445.08 ± 34.24	88.07 ± 74.98
$R 2_{M L}$	28,899.21 ± 7443.42	28,189.69 ± 7475.91
$G_{G L B_R 2 m i n}^{O p t}$	65.3366 ± 4.0328	411.4906 ± 106.8561
$G_{G L B_R 2 m i n}^{M L}$	65.2343± 4.0265	410.7837 ± 106.6713
$L o s s_{G L B_R 2 m i n}$ [%]	98.46 ± 0.11	99.69 ± 0.26
$L o s s_{M L}$ [%]	0.16 ± 0.01	0.17 ± 0.01
$K R$ [%]	11.29 ± 0.14	11.01 ± 0.14
$K A R$ [%]	100 ± 0.00	99.96 ± 0.09
$L R$ [10⁻³]	0.0286 ± 0.0027	0.0286 ± 0.0032

Table 3. Results of the randomness tests for the sequences from systems A and B. The tests are presented in terms of p-values. Denotations from the table refer to test names: Frequency, Block Frequency, Runs, Longest run, Fast Fourier Transformation, Serial, Approximate Entropy, Cumulative Sums forward, and Cumulative Sums reverse, respectively. The sequences of systems A and B used for the tests have a length of 10 million bits.

p - v a l u e t r e s h o l d = 0.01

.

Table 3. Results of the randomness tests for the sequences from systems A and B. The tests are presented in terms of p-values. Denotations from the table refer to test names: Frequency, Block Frequency, Runs, Longest run, Fast Fourier Transformation, Serial, Approximate Entropy, Cumulative Sums forward, and Cumulative Sums reverse, respectively. The sequences of systems A and B used for the tests have a length of 10 million bits.

p - v a l u e t r e s h o l d = 0.01

.

Raw	F	BF	R	LR	FFT	S	AE	CSf	CSr
A	0.0661	0.8932	0.9852	0.2289	0.4825	0.2203	0.4928	0.0104	0.0108
B	0.1268	0.5856	0.8277	0.5529	0.7190	0.2801	0.4120	0.1366	0.0835

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Radomirović, J.; Milosavljević, M.; Banjac, Z.; Jovanović, M. Secret Key Distillation with Speech Input and Deep Neural Network-Controlled Privacy Amplification. Mathematics 2023, 11, 1524. https://doi.org/10.3390/math11061524

AMA Style

Radomirović J, Milosavljević M, Banjac Z, Jovanović M. Secret Key Distillation with Speech Input and Deep Neural Network-Controlled Privacy Amplification. Mathematics. 2023; 11(6):1524. https://doi.org/10.3390/math11061524

Chicago/Turabian Style

Radomirović, Jelica, Milan Milosavljević, Zoran Banjac, and Miloš Jovanović. 2023. "Secret Key Distillation with Speech Input and Deep Neural Network-Controlled Privacy Amplification" Mathematics 11, no. 6: 1524. https://doi.org/10.3390/math11061524

APA Style

Radomirović, J., Milosavljević, M., Banjac, Z., & Jovanović, M. (2023). Secret Key Distillation with Speech Input and Deep Neural Network-Controlled Privacy Amplification. Mathematics, 11(6), 1524. https://doi.org/10.3390/math11061524

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Secret Key Distillation with Speech Input and Deep Neural Network-Controlled Privacy Amplification

Abstract

1. Introduction

2. Proposed Sequential Key Distillation Strategy for Speech Signals

3. Machine Learning Subsystem

3.1. Spoiling Knowledge Lower Bound for ECRE2

3.2. Interval Prediction Deep Neural Network for Predicting the Spoiling Knowledge Lower Bound for ECRE2

4. Experimental Evaluation

4.1. Voice Data Set

4.2. Decorrelation of Eavesdroppers and Legitimate Users

4.3. Feature Engineering Based on Information Theoretic Measures and Stylometry

4.4. Performance Measures

5. Security Issues and Application

5.1. Limitation of the Proposed Methodology

5.2. Robustness to Different Types of Speech Input

5.3. Generalization to Different Types of Encryption Protocols and Cryptographic Systems

5.4. Computational Complexity

5.5. Summary of the Main Challenges

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI