1. Introduction
In this paper, we asked the following question: Is it possible to design an autonomous, perfectly secret speech communication system whose integral part is the subsystem for the generation and distribution of the required amount of secret keys in real time?
As is known [
1], the Vernam cipher [
2] or One-Time-Pad (OTP) [
3] satisfies the condition of perfect secrecy, and the price to be paid is that the secret key rate is equal to the message rate. The need for secret keys decreases proportionally with the message rate, justifying the use of low-bit-rate voice coders in practical implementations [
4,
5,
6]. However, the problem of efficient generation and distribution of secret keys in the case of the Vernam cipher is still open.
While physical key distribution remains the simplest but impractical option over long distances, more modern approaches, such as quantum key distribution and the use of physical entropy sources, provide promising methods for future systems. Quantum key distribution stands out for its provably secure nature, though it requires significant infrastructure [
7].
In the group of methods that use physical entropy sources, a special place is occupied by the methods of generating and distributing secret keys using sequential key distillation (SKD) protocols by public discussion [
8,
9,
10]. Their application requires two prerequisites:
Sources of common randomness with sufficient capacity shared by communication parties (Alice, Bob)
An additional authenticated communication channel of appropriate capacity, which may be public, and which is assumed to be wiretapped by an attacker (Eve).
By far, most solutions in this category take the wireless channel itself as a physical source of common randomness [
11,
12,
13]. However, we exclude this approach due to our requirement for system autonomy, which implies independence from the used communication channel. A small number of remaining papers are dominantly related to various biometric signals as sources of common randomness [
14,
15,
16]. However, there are no solutions that would be based on common randomness sources independent of the used telecommunication channels while guaranteeing the secret key generation speed equal to low-bit-rate vocoder speeds of 1.2 kb/s or 2.4 kb/s.
While most existing works investigate key generation protocols in a particular environment, very few of them focus on the joint design of key generation and OTP. A straightforward way is cascading or parallel key generation and OTP, but more complex constructions appear in the literature. For example, in [
17], it was proved that using a non-reconciled key for OTP outperforms classical identical key OTP.
In the first step of the proposed system’s design, we select a standard low-bit-rate vocoder MELPe with a speed of 1.2 kb/s [
5,
18] and an appropriate authenticated public channel, which must be available during system operation.
In the second step, based on extensive tests of this vocoder in real working conditions within the vPCP-V system [
19], we identify a source of common randomness suitable for generating and distributing secret keys at speeds far higher than 1.2 kb/s. This source consists of a locally synthesized speech signal on the transmitting side (Alice) and a corresponding synthesized signal on the receiving side (Bob). The differences between these signals come from different local sources of randomness, which are used to form the complex excitation of the MELP vocoder synthesizer. This source of shared randomness can be used in the open speech mode, which, as a rule, precedes the phase of secure voice communication. All information about input voice that flows through the public channel to Eve during the execution of the SKD protocol is of no importance since communication is also open through the main channel. However, in the phase of protected communication on the main channel, this information will significantly reduce the uncertainty of the input voice. Therefore, we introduce a new synthesizer on the Alice and Bob side for this phase, whose synthesis filters are set based on secret randomly chosen LSP parameters [
18,
20]. These values are available to both Alice and Bob since they are formed from previously distilled secret keys.
In this way, we obtain all the necessary conditions for the design of a perfectly secret autonomous low-bit-rate voice communication system.
1.1. Related Works
To focus on relevant research, it is essential to recall the fundamental requirements that a highly secure voice protection system must meet [
21].
Strong end-to-end encryption—only legitimate parties should be able to encrypt and decrypt the communication.
Key management procedures must be designed in a way that does not compromise the declared security level of the system.
A secure key agreement must be ensured.
The choice of encryption algorithm and secret key length must support the proclaimed security level.
The system must be resistant to distortions introduced by GSM codecs, which are designed for speech transmission over mobile networks rather than encrypted speech, which is inherently a data stream.
Minimization of transmission error rates—this requirement is crucial for low-bit-rate vocoders. The analytical–synthetic approach of vocoders inherently reduces the quality of synthesized speech on the receiving side, meaning that additional transmission errors must be minimized.
According to the classification provided in [
22], most research falls into two distinct categories:
Due to the generic structure of modem-based cryptographic techniques, the vPCP-V system belongs to this category [
22,
23]. Therefore, our primary focus will be on this class of systems. Table 2 in ref. [
22] summarizes the performance of 22 systems published up to 2022. Without exception, all solutions are based on standard cryptographic algorithms with finite secret keys (AES, RC4, TEA) and rely on conventional public key infrastructure (PKI) for key generation and distribution.
As a result, these systems do not ensure autonomy, as they depend on a Trusted Third Party (TTP), nor do they achieve perfect secrecy, since their secret key rates are far below the speech data rate they encrypt.
In [
24], a practical implementation of a lightweight AES algorithm (128-bit key) in FPGA technology for peer-to-peer voice encryption was analyzed. It is evident that this work falls within the same category of non-autonomous and non-perfectly secret systems. Similarly, ref. [
25] proposes a new VoIPChain system for authentication in Voice over IP using Ethereum Blockchain technology [
26]. While this decentralized system overcomes many security issues associated with traditional single-server PKI infrastructures, it still requires a security infrastructure, including maintaining a shared ledger. Consequently, this system also falls under the category of non-autonomous, non-perfectly secret voice transmission systems. Furthermore, ref. [
27] presents various security concepts and communication systems proposed by NATO Research Task Group IST-174, titled “Secure Underwater Communications for Heterogeneous Network-enabled Operations”. These efforts contribute to standardization in the field. Recognizing that key allocation and management remain major challenges in symmetric cryptographic systems, the study identifies SKD protocols over appropriate sources of common randomness as one of the most promising technologies, which is fully aligned with our findings. However, it is important to note that this work does not discuss the use of SKD for real-time perfectly secret systems, but only in the context of traditional non-perfect cryptographic systems.
In the category of secure voice communication based on chaotic cryptographic techniques, seven solutions published up to 2019 were analyzed in [
22]. Additionally, we incorporated the latest studies [
28,
29,
30] into this review.
Secure voice communication systems based on chaotic cryptographic techniques leverage chaotic algorithms due to their sensitivity to initial conditions and their ability to generate pseudo-randomness, both of which are essential for secure encryption. However, from the perspective of information-theoretic security, any chaotic algorithm can be considered a deterministic dynamic system fully defined by its initial states. Consequently, the equivocation of secret keys in such systems cannot exceed the length of their binary representation. Therefore, similar to classical cryptographic algorithms with finite secret keys, this class of algorithms cannot provide perfect secrecy.
Notably, no prior work has proposed a perfectly secret autonomous system where secret key generation is derived from a synthesized speech signal, independent of the telecommunication channel and without external key distribution infrastructure.
1.2. Innovation and Engineering Value
Our novel approach for autonomous, perfectly secret low-bit-rate voice communication is based on the following key innovations:
Artificially Synthesized Speech as a Common Randomness Source
Unlike natural speech, synthesized speech allows precise control over its randomness properties.
The proposed system uses LSP parameters of a MELPe-like synthesizer to generate shared randomness between Alice and Bob.
This approach eliminates the need for external entropy sources such as radio channels or biometrics.
Independent True Randomness for Enhanced Security
The LSP parameters are randomly selected, ensuring unpredictability.
An independent source of true randomness (e.g., from a cryptographic random number generator) ensures entropy is sufficient for perfect secrecy.
SKD Over a Public Authenticated Channel
A real-time SKD protocol is executed over an authenticated but public channel, allowing Alice and Bob to extract a mutually secret key from their synthesized speech signals.
The key rate achieved significantly exceeds the 1.2 kb/s or 2.4 kb/s needed for MELPe encryption, ensuring continuous perfect secrecy.
No Prior Key Distribution or TTP
The system operates autonomously without requiring prior key exchange.
Unlike Quantum Key Distribution (QKD) or traditional key management systems, no external infrastructure is needed.
Real-Time Suitability for Low-Bit-Rate Voice Encryption
The key generation rate is synchronized with the encryption rate of MELPe vocoders, allowing seamless one-time-pad encryption.
vPCP-V was used for empirical verification, demonstrating an achievable secret key rate of up to 8.8 kb/s and a BER of order for various communications channels, including GSM 3G and GSM VoLTE networks.
Considering all of this, the proposed system meets all the fundamental security and functional requirements that a protected system must meet.
1.3. Contributions of This Work
A novel autonomous key generation method based on synthesized speech.
A low-bit-rate perfectly secret communication system combining MELPe, Vernam cipher, and real-time SKD.
Experimental validation of the proposed system, demonstrating its feasibility and security.
A discussion on the implementation challenges and scalability of the system in real-world applications.
1.4. Paper Organization
The paper is organized as follows.
Section 2 presents the architecture of the proposed system.
Section 3 presents an analysis of identified sources of common randomness.
Section 4 describes the LSP-based linear prediction (LP) synthesizer. In
Section 5, an information-theoretic analysis of the source of common randomness based on randomly selected LSP parameters is provided.
Section 6 presents the new privacy amplification strategy based on the so-called Huffman–Renyi difference, which is suitable for application in APS-VCS. The experimental evaluation of the proposed system executed on the Vlatacom Personal Crypto Platform for Voice encryption is provided in
Section 7, while
Section 8 provides the conclusion.
2. System Architecture
A generic secret low-bit-rate speech communication system is shown in
Figure 1. According to the classification provided in [
22], it belongs to the category of secure voice communication using modem-based cryptographic techniques. A key aspect of the information-theoretic approach to verifying the security level of a system is the amount of information an eavesdropper can obtain about the messages based on ciphertext observations. It is well known [
1] that if
is generated using any cryptographic algorithm with a finite secret key
, its equivocation from the attacker’s perspective,
, rapidly converges to zero after a sufficiently long ciphertext observation. Practically, this means that such a system is not information-theoretically secure, and its security depends on the computational power of the adversary. Once the amount of observed ciphertext satisfies the condition
, the key
has a unique solution, meaning that, in cryptanalytic terms, the system is broken. However, if
is a purely random sequence independent of the messages
, it can be shown that the mutual information
. This implies that the attacker cannot retrieve the messages regardless of their computational resources, making the system perfectly secret.
We selected a standard low-bit-rate vocoder MELPe with a speed of 1.2 kb/s [
5,
18]. Note that the MELPe vocoder can be replaced by any other standard vocoder with similar performance. The input speech signal
is sampled at 8 kHz, discretized with 16 bits per sample, and divided into frames lasting 67.5 ms (540 samples). In the MELPe analyzer, 81 bits are generated for each input frame, of which 80 bits code 10 LSP parameters of the LP speech production model, while the 81st bit is the synchronization bit. This bit stream is encrypted by adding modulo 2 with the binary pseudorandom sequence
generated by key stream generator KSG(K) with secret key K, which must be shared between legitimate parties before the start of communication. The ciphertext
is transmitted over the main channel after appropriate modulation. On the receiving side, after demodulation, decryption is performed. Decryption is done by adding modulo 2 with a synchronously generated binary sequence
on the receiving side. As a result, identical LSP parameters are obtained, which produce the reconstructed speech signal
in the MELPe synthesizer block. With
(
), we denoted our first candidate for the source of common randomness, formed from the input speech signal
on the transmitter side and the speech signal
synthesized on the receiver side; see
Figure 1.
Figure 2 shows an extension of the generic scheme from
Figure 1 with local MELPe synthesis on the transmitter side.
is denoted as our second candidate for the source of common randomness, formed from the locally synthesized speech signal
on the transmitter side and the speech signal
synthesized on the receiver side. If there are no transmission errors, the synthesis filters on the receiving and transmitting sides are equal, and the differences in the synthesized signals come from different local sources of randomness, which are used to form the complex excitation of the MELPe vocoder synthesizer.
As is known [
9], during the execution of the SKD protocol, in the advantage distillation (AD) and information reconciliation (IR) phases, the information exchanged between Alice and Bob over the public channel is available to Eve. For example, if the bit parity (BP) protocol [
31] is used in the AD phase, the number of parity bits of Alice’s sequence that are available to Eve is provided by following Lemma.
Lemma 1. Let the initial strings and , owned by Alice and Bob at the beginning of the SKD protocol, be binary iid random sequences of length at Hamming distance , . Then, the expected number of parity bits that Alice exchanges with Bob over the public channel is provided bywhere is the number of iterations of the BP algorithm. Proof. The proof follows directly from the fact that the total amount of parity bits exchanged in the BP algorithm is equal to the sum of exchanges for each iteration. On the other hand, for each iteration, this value is equal to half the length of the sequences at the beginning of the iteration. This length is also equal to the length of the sequences at the end of the previous iteration. In [
32], Theorem 2.2.3, p. 17, an expression for the compression rate of the BP algorithm in each iteration is provided. Based on this formula, we obtain the length of the sequences after
iterations i.e.,
. Summarizing these values over all iterations, we obtain statement (1). Note that the smallest integer operator
is applied due to the very nature of parity checking of 2-bit blocks. This completes the proof of the Lemma. □
The number of bits of significance for an eavesdropper may be slightly less than
since some parity equations may be linearly dependent. However, as the
parity bit in the first iteration of the BP algorithm is mutually linearly independent, it always holds
This means that the uncertainty of the speech signal at the input to the system is at least halved, as viewed from the eavesdropper’s side. In
Figure 3, an example of the dependence of
as a function of
is shown for
and
Remark 1. Since our goal is a perfectly secret system, we can conclude that the first two candidate sources of common randomness cannot be used during protected communication, but only in the open operating mode, which, as a rule, precedes the protected one. Namely, in the typical use of such systems, legitimate parties establish a communication link through usual open communication. After checking the connection quality and mutual consent of the parties, the switch is made to encrypted communication. Practice shows that this part of open communication lasts 2 to 10 s. As the communication is open, the leakage of information about open speech to the eavesdropper does not play any role; that is, previously identified sources of common randomness can be used for the safe distillation of secret keys.
Following the previous logic, a good source of common randomness could be
. Signals
and
were obtained using local LP syntheses with the same LSP parameters about which Eve has no information. Let us imagine two First In First Out (FIFO) buffers of identical secret random content on Alice’s and Bob’s side; see
Figure 3. If we interpret this content as a set of randomly selected LSP parameters about which Eve has no information, by synchronized reading, both sides can synthesize the required signals
and
. In that case, Alice and Bob can use the SKD protocol over the source
to distill secret keys without Eve receiving a single bit of information from the public channel about the input speech signal S. Namely,
having in mind the way the
and
signals were generated. The FIFO buffer is continuously replenished with just-distilled secret keys, while the secret key sequence
is synchronously read on the receiving and transmitting side. By summarizing modulo 2 with the output sequence of the MELPe analyzer
,
we form a perfectly secret Vernam cipher. A necessary condition for maintaining the system in continuous perfect secrecy is that the filling speed of the FIFO memories on the transmitting and receiving side must not be less than the reading speed, i.e.,
In (5),
is the secret key distillation rate,
is the Vernam cipher secret key consumption rate, and
is the consumption rate of the LP synthesizer. Note that
must be equal to the output sequence rate of the MELPe vocoder in the main channel
Figure 4 shows the generic scheme of this APS-VCS concept.
3. Identification and Analysis of Possible Sources of Common Randomness
The previous analysis shows that in the open speech phase, two sources of common randomness,
and
, are available. In order to evaluate which of these sources is more suitable, we conducted an experimental evaluation in real conditions of communication with the vPCP-V system (see
Figure 5) and Vlatacom True Random Number Generator (vTRNG) [
19,
33]; see
Figure 6.
For experimental evaluation, we formed a test set consisting of 24 speech signals with speakers reading the provided text. In 14 cases, the text was unique, while in the remaining 10 cases, the speakers recorded the repeated text. The signals have durations between 32 s and 59 s, are sampled at a frequency of 8 kHz, and are discretized with 16 bits per sample.
Figure 7 shows the cross-correlation function between the original input speech signal S and the synthesized received signal
for sample No. 1 from the test set.
Figure 8 shows the corresponding cross-correlation function between the locally synthesized speech signal
and the synthesized received signal
for the same speech sample. From the examples shown, it is clear that the correlation of the common randomness source
is, by an order of magnitude, higher than the
source.
This fact was confirmed across the entire test sample.
Figure 9 shows the logarithm ratio
for all 24 test samples of speech signals. Only in the case of test sample No. 3 is this ratio less than 1. This indicates that the cross-correlation
is almost always significantly higher than the cross-correlation
. Therefore, we have decided to use the source
for the execution of the SKD protocol in the open phase of communication of the APS-VCS system.
4. LSP-Based LP Synthesizer
In the open communication phase, the analyzer and synthesizer of the built-in MELPe vocoder are used to form the
source and distill the secret keys used to initially fill the FIFO memories. For the purposes of protected communication, it is necessary to form a
source based on the LP synthesizer, which can be much simpler than the MELPe synthesizer. Namely, the complexity of the MELPe synthesizer originates from the complex process of forming the excitation signal in order to meet the demanding criteria of intelligibility and naturalness of the synthesized speech. This requirement has no significance in the formation of synthesized signals
and
. Therefore, the excitation is greatly simplified and consists of a periodic train of unit pulses, with possibly controlled jittering and the addition of locally generated purely random noise. For this purpose, in the system experimental evaluation, we used vTRNG based on a natural process entropy source with a built-in randomness checking system; see
Figure 6.
Figure 10 and
Figure 11 show a generic scheme generator of locally synthesized signals based on random LSP parameters, periodic pulse input, and additive noise at the input and the output to the LP synthesizer filter, respectively.
The LP synthesis filter is provided by the transfer function
where
is the order of the LP filter
If the LP filter is of the minimum phase, i.e., if all its zeros are inside the unit circle in the Z plane, the LP synthesis filter is stable.
The coefficients of the LP filter
are obtained based on the loaded set of LSP parameters from the FIFO memory
subject to restriction
As is known [
20], the LP filter
can be decomposed in the form
where
and
are the so-called even and odd polynomials defined by LS frequencies (10)
If we know the LSP parameters (from (9)) by replacing (12) and (13) in (11), and then in (7), we obtain the LP synthesis filter . This allows for the synthesis of the signals and , used in the SKD protocol in the protected phase of system operation.
Remark 2. If and only if the condition (10) is strictly satisfied, i.e., if all zeros of the polynomials and alternate in the range , it can be shown that the LP filter generated in this way is of minimum phase [34,35]. Therefore, if we randomly select p LSPs from the range and sort them in ascending order, then conduct the above procedure to form the polynomials , , and , the resulting LP synthesis filter will be stable. 5. Information-Theoretic Analysis of the Source of Common Randomness Based on Randomly Selected LSP Parameters
If the proposed system provides negligible leakage of distilled secret keys to Eve (see
Figure 4 and
Figure 12), then there is simultaneously negligible leakage of Vernam cipher secret keys in the main channel, i.e., the system is able to maintain perfect secrecy.
To justify this claim, an appropriate information-theoretic analysis should answer the following two questions:
Namely, the uncertainty of these signals originates from the uncertainty of the applied LSP coefficients, as well as additive purely random locally generated noise, regardless of whether it is located at the input or output of the synthesizer. Due to the closed loop that includes the LP synthesizer, SKD block, secret key FIFO buffer, and the LP synthesizer, if the local source of pure randomness had a negligible contribution to the entropy of the LP synthesizer output, the effective Key rate of generated secret keys for Vernam cipher would, over time, trend toward zero. Formally, it is valid
where
is the impulse response of the LP synthesis filter
,
is periodic pulse input,
is additive pure random noise, and
is the convolutional operator. Based on classical results [
36,
37], since
it follows that the synthesis filter
preserves the input entropy, i.e., that it holds
regardless of whether there is additive noise at the input or output of the filter
. Further,
since there is a 1–1 correspondence between the LSP and the
parameters of the filter
[
34].
Remark 3. Considering (17), it is clear that
must be significantly dominant with respect to H(LSP) for to maintain a sufficient level of “innovation” entropy necessary for the distillation of perfectly secret keys for the Vernam cipher in the main channel.
The total entropy of synthesized signals (17) can be expressed as a function of Signal-to-Noise Ratio (SNR).
Lemma 2. Let Signal-to-Noise Ratio (SNR) be provided in dB. Then, the noise entropy per one sample of the signal is equal towhere n is the number of bits used to encode signal samples, is the noise in the interval , is the impulse response of the LP synthesis filter, and has the meaning of the Euclidean norm operator. Proof. Based on Theorem 8.3.1 [
38], the entropy of a continuous Riemann integrable random variable
, of probability density
, quantized with n bits is
Let us first prove (18). Let the noise
be uniformly distributed in the interval
. Then the first term in (20) is equal to
while the second term is equal to the number of bits encoding the noise signal in the range
. Since it is equal to the number of occupied quantization levels, we obtain
We have thus proved the correctness of statement (18). To prove statement (19), it is sufficient to directly follow the definition of SNR [dB], namely
since
From (23), solving for , we obtain (19), which completes the proof. □
The notation specifically refers to , unless explicitly stated otherwise.
Remark 4. Based on Lemma 2, it follows that with the appropriate choice of SNR, we can control the size of the innovative entropy and its dominance in relation to . Note that is a fixed quantity equal to the number of bits used to encode the LSP parameters. In the case of the MELPe vocoder, its corresponding rate (6) is equal to 1.2 kb/s.
In systems with the application of the SKD protocol over classic CR sources, the basic quality criterion of the system is the amount of information
that Eve can obtain about the generated secret keys following the communication over the public channel. Formally,
where
is a random n-bit string with uniform distribution over
at the output of the optimal Huffman encoder [
39],
is a particular value of optimal Eve’s estimate of
, while
is a distilled secret key. The
is chosen at random from a universal class of hash functions from
to
[
40]. According to well-known results from [
41], specifically Corollary 4, Eve’s information about
for specific
and
decreases exponentially in the excess compression
where
is the lower bound of Eve’s conditional Renyi entropy of order two (so-called collision entropy) about
, i.e.,
However, since, in the APS-VCS system,
and
depend on previously distilled secret keys, it is necessary to examine whether Eve’s information
about the
is also negligibly small. According to (26), it holds
where
is the lower bound of Eve’s conditional Renyi entropy of order two about
, i.e.,
Note that, in (28), by , we denote the distilled secret keys, when Eve possesses some information about . Since this fact will affect her optimal strategy, and thus the length of the distilled keys, in the general case .
The optimal PA strategy must, therefore, rely on Eve’s conditional Renyi entropy, which is
or in terms of their minimal values
Remark 5. If the conditional Renyi entropies and were identical or slightly different , this would mean that Eve’s information about the key does not affect her information about the distilled key , and that, according to (28) and (29), this information decreases exponentially in the excess compression ≈ .
Whether this is true or not for APS-VCS, we have tested empirically, estimating these two distributions in an experiment with 1000 locally synthesized signals
and
of length 540 samples encoded with 16 bits. The SKD protocol consists of the BP algorithm for the AD phase, the Winnow algorithm for the IR phase followed by the optimal Huffman encoder and Universal hashing; see
Figure 12. The selection and optimization of algorithms and parameters for the SKD protocol are thoroughly discussed in the works [
14,
42]. In this paper, we use parameters from those works that resulted in the highest key rate with minimal information leakage. Specifically, we use the AD algorithm for two iterations, after which the error becomes sufficiently small to be corrected by the Winnow algorithm. For the IR phase, we have chosen the 8-bit Winnow algorithm, which has been shown in [
31] to be optimal in terms of minimizing the information that an eavesdropper can gain during the IR phase. For the universal class of hash functions, a binary matrix with a Toeplitz structure is used since its complexity is
.
Figure 13 shows the distributions of conditional Renyi entropies
and
, and
Table 1 shows their means and variances. We can conclude that the distributions are almost identical and that extremely small differences originate from the inherent properties of random experiments on finite samples.
Remark 6. The presented theoretical analysis and experimental verification make it possible to conclude that the proposed APS-VCS system is resistant to attacks on the contents of the FIFO memories. This property logically follows from the properties of PA based on universal hash functions, as well as a sufficient amount of innovative entropy that refreshes the information content of the synthesized signals and . Therefore, the answer to the question of whether there is an Eve strategy that provides it an advantage over SKD systems based on classical CR sources is negative.
6. The New Privacy Amplification Strategy Based on So-Called Huffman–Renyi Difference
As is known, refs. [
14,
15,
42], in order to efficiently utilize a particular CR source, it is necessary to adaptively determine the degree of compression of the PA block. In this way, the speed of generated secret keys is adjusted to the side information available to Eve. Complex machine-learning systems developed for these purposes can be replaced by simpler yet still very effective procedures for certain classes of CR sources.
Figure 14 shows a histogram of the difference
between the length of the sequence
at the output of the Huffman encoder and the conditional Renyi entropy
of that same sequence observed by Eve for synthesized signals with SNR = 39.9 dB. We will call the quantity
the Huffman–Renyi difference. We notice that the mean value is very close to 0, more precisely 1.84 bits, and that there is a negligible number of samples outside the range of
bits. Based on (32), we can derive a simple estimator
From (33), we see that with a quality estimate for
and knowing
, we can also obtain a quality estimate for
. If we set
equal to the mean value of
, then with high probability
Bearing in mind that the degree of compression of the PA block is equal to
, we arrive at three possible PA strategies:
The strategies are ordered according to the increasing degree of compression. Strategy (37) allows the security margin to be chosen by the predefined value of the leakage rate.
7. Experimental Evaluation
In the first step of the synthesis of the APS-VCS system, it is necessary to choose operational values for the main system parameters, such as the secret key rate, SNR of synthesized signals, innovation entropy rate, and security margin.
Figure 15 shows the interdependencies of the operating ranges of these quantities, obtained on a real APS-VCS system, by averaging 100 values for each SNR value in the range from 10 to 50 dB. A distillation of secret keys was performed using three different PA strategies: mean (blue line),
(orange line), and
,
(green line). If the PA strategy
is taken as a reference, and KR is at least 2.4 kb/s, it is obtained for an SNR [dB] working range [29.4, 47.5], innovative entropy [kb/s] [65, 88], KR = 2.5 kb/s, and security margin [b] [70, 460]. The order of selection is as follows; see
Figure 15:
The desired KR is selected. Recall that it must satisfy constraints (5) and (6).
The security margin is chosen in accordance with the requirements of the overall security of the system. Taking into account (26) and (28), with increasing , the degree of compression in the PA block increases, and thus, the information Eve can obtain about the generated keys decreases exponentially.
The choice of security margin uniquely determines the SNR.
The obtained value for SNR uniquely determines the innovative entropy.
Remark 7. Since for each pre-fixed KR, the security margin can be in a wide range, the system designer has great freedom to easily choose the parameters of the synthesizer that will simultaneously satisfy the requirements for the rate of generating secret keys, their maximum entropy, non-repeatability, and negligible information leakage to Eve. All these elements confirm the basic requirements that must be fulfilled by the secret keys of the Vernam cipher in order to maintain its perfect secrecy.
Figure 16 shows the functional description of the operation of the APS-VCS system. The subsystem for generating secret keys starts working after the end of the open communication phase. A necessary condition for the functioning of this subblock is the initial successful filling of the FIFO memories on the side of Alice and Bob with distilled secret keys of at least 160 bits. The first 80 bits will be used as a secret key to encrypt the LSP of the first block of the input speech signal, while the next 80 bits will be used for the LP synthesizer in the SKD block. Algorithm 1 provides a detailed explanation of the process of transforming the binary sequence into LSP parameters, while Algorithm 2 offers a detailed explanation of the transformation of LSP parameters into LP synthesis filter H(z) parameters.
Table 2 shows the result of the experimental evaluation of distilled secret keys based on source
,
during the open phase of communication. Since the average value of KR = 12.88 kb/s, just 1 s of open communication on average fills the FIFO memories with the entire 12.88 kb, which far exceeds the required 160 bits. Note that the Key Acceptance Rate (KAR) does not have to have a maximum value of 100%, which is an important indicator of the efficiency of the SKD protocol in usual applications [
11]. The measured value of the exchanged bits on the public channel (average—6175 bits, max—7063 bis) per block of 8640 bits shows a significant information leakage about the input speech, which is not a security treat because it is not encrypted at this stage. The leakage rate of secret keys used to fill the initial content of the FIFO memories is only 0.0012 b/b. This value can be reduced at the request of the designer by introducing an additional security margin, which can be of the order of several hundred bits; see
Figure 15.
After the phase of open communication and successful filling of the FIFO memories with the initial content, the system transitions to secure communication. Simultaneously, synthesis and public channel communication are performed. Secret keys distillation based on source
,
is now occurring. In
Table 3, the results of this SKD protocol performed over the 1000 blocks and measured on the vPCP-V system are shown. Since the graphical presentation of these results is provided in
Figure 15,
Table 3 shows numerical results only for five characteristic SNRs, in the range from 10 to 50 dB. It is noted that in the already mentioned operating range [29.4, 47.5], KAR does not fall below 100%, while LR is in the order of 0.0011–0.0042 b/b. The presented results show that the SKD over the source
,
with large security margins ensures stable refreshment of the FIFO memories with newly generated secret keys, which are then used as secret keys of the Vernam cipher in the main channel.
As observed in
Figure 15, with an increase in the SNR of the synthesizer, the KR increases regardless of the applied PA strategy. This behavior can be easily explained based on the model (14) of synthesized signals, according to which the mutual correlation between
and
increases with increasing SNR; that is, with the decreasing influence of local noise
relative to the deterministic component
. The decrease in innovation entropy with increasing SNR follows the same mechanism: as SNR increases, the proportion of noise
in the total synthesized signal decreases, and, consequently, the corresponding innovation entropy declines.
The increase in security margin
with rising SNR is directly related to its definition as the difference
, where
represents the lower bound of Eve’s conditional Rényi entropy of order two, and
is the actual length of the distilled secret key; see (26). Since
directly determines the maximum KR of the specific system (a higher
leads to a higher maximum distilled KR), it is clear that the variation of the security margin will follow the same dependency with changes in SNR, as also observed in
Figure 15.
Therefore, we can conclude that the experimental evaluation confirms an excellent agreement with the theoretically expected results, both in terms of the key rate and the changes in innovation entropy and security margin with respect to the SNR of the model (14).
The presented order of selecting key system parameters KR, SNR, and
can also be interpreted in the following way. The operating point B in
Figure 15 is obtained at the intersection of the KR dependence on SNR and the desired value of KR. This point determines the lower bound for SNR. Point A in
Figure 15 is obtained by determining the maximum allowable SNR based on the minimum permitted innovation entropy. The allowable SNR variation interval directly dictates the range of possible values for KR, innovative entropy, and security margin
.
Remark 8. Since SKD over the source , ) is performed independently of the system operation on the main channel, its parameters, such as sampling rate and resolution of synthesized signals, can be almost arbitrarily different, allowing for secret key generation rates in a much wider range of values. The only limiting factor of the secret key distillation rate is the communication capacity of the public channel. Therefore, the proposed APS-VCS system can operate reliably at other standard vocoder rates (2.4 kb/s, 4.8 kb/s) with the appropriate public channel bandwidth.
The price paid for the perfect secrecy of the APS-VCS system is the establishment and maintenance of a public channel during secure conversation. However, the main and public channels operate in an asynchronous mode, which significantly simplifies practical implementation. The only condition that must be met is to maintain the constant FIFO memory read rate of 1.2 kb/s for the Vernam cipher.
Algorithm 1 Binary sequence () to LSP parameters transformation |
Input: Binary sequence Output: LSP parameters |
1: Read random sequence from FIFO, 2: Divide in subsequences, i.e., , 3: Transform each to by rescaling decimal value with factor 4: Sort obtained parameters according to (10) |
Algorithm 2 Transformation of LSP parameters to LP synthesis filter H(z) parameters |
Input: LSP parameters Output: H(z) parameters |
1: Calculate P(z) according to (12) 2: Calculate Q(z) according to (13) 3: Calculate the LP filter according to (11) 4: Calculate the LP synthesis filter according to (7) |
Table 4 shows the bit-error rate results for four typical communication channels, with and without the use of error-correcting code (ECC). For the ECC, Golay(12,24) is used, which is specifically designed to protect the 15% most sensitive bits of the binary representation of LSP parameters that are subject to encryption. It is important to note that the first two types of GSM channels produce exceptionally good results, considering the impact of the input compression block in GSM devices. These experimental results confirm the essential functionality that a highly secure voice protection system must meet [
21]. The results demonstrate that the BER shown in
Table 4 is independent of the SKD system, provided that the synthesizer parameters are selected to guarantee a key generation rate greater than 2.4 kb/s.
7.1. Comparison with State-of-the-Art QKD Methods
Both Quantum Key Distribution (QKD) systems and our LSP-based artificial speech synthesizer provide a foundation for perfectly secret communication by enabling the real-time generation of symmetric keys. However, they differ significantly in implementation, autonomy, and practicality when applied to low-bit-rate secure speech communication. In
Table 5,
Table 6,
Table 7,
Table 8 and
Table 9, we compare the two approaches across key aspects.
Key Advantage: The LSP-based system provides common randomness generation without specialized hardware, making it significantly more practical and scalable compared to QKD.
Key Advantage: The LSP-based synthesizer ensures continuous key generation synchronized with speech encryption rates, while QKD systems often struggle with lower key refresh rates, requiring buffering or hybrid encryption approaches.
Key Advantage: Our LSP-based solution can be integrated into existing secure voice systems, whereas QKD demands costly and specialized infrastructure, limiting its practical use for low-bit-rate real-time speech encryption.
Key Advantage: While QKD offers provable key exchange security, it does not inherently provide perfect secrecy for real-time speech without additional encryption. However, our approach directly enables perfectly secret communication in real time, eliminating reliance on additional cryptographic layers.
Final Verdict: For a real-time, low-bit-rate, perfectly secret speech communication, the LSP-based artificial speech synthesizer is significantly more practical, autonomous, and scalable than QKD-based approaches. While QKD remains valuable for a high-security key exchange, it is impractical for direct application in real-time speech encryption due to infrastructure constraints and lower key refresh rates.
7.2. Potential Attacks and Countermeasures
Despite its strong theoretical foundation, the APS-VCS system must be resilient to various potential attacks. Below, we discuss major threats and how the proposed approach successfully mitigates them.
Man-in-the-Middle (MitM) Attacks: Since the SKD protocol relies on a public authenticated channel, an adversary could attempt to inject or manipulate messages. The use of authentication mechanisms and error correction ensures that only legitimate parties can participate in key generation, effectively preventing MitM attacks.
Eavesdropping Attacks: The Vernam cipher ensures perfect secrecy, making intercepted ciphertexts indecipherable without the secret key. Additionally, since secret keys are distilled in real time and never reused, an eavesdropper gains no useful information even if past communications are compromised.
Side-Channel Attacks: Attackers may attempt to extract key information by analyzing power consumption, timing variations, or electromagnetic emissions. Implementing countermeasures such as randomized computational delays and hardware shielding can mitigate these risks.
Replay Attacks: To prevent adversaries from capturing and replaying key exchange messages, each SKD session includes time-varying elements and freshness indicators. This ensures that old messages cannot be reused to compromise the system.
Quantum Attacks: While current quantum computers do not threaten the information-theoretic security of the Vernam cipher, they could weaken authentication and key distillation mechanisms. Future enhancements could incorporate quantum-resistant authentication schemes to ensure long-term security.
7.3. Real-World Applications and Performance Advantages
The proposed APS-VCS system offers significant advantages in real-world applications where traditional cryptographic methods fail to provide both autonomy and perfect secrecy. Below are key scenarios where APS-VCS outperforms existing solutions:
Military and government communications. APS-VCS eliminates the need for external trusted key distribution infrastructure, making it ideal for military and government operations where high security and operational autonomy are required. Unlike QKD-based systems, which require specialized optical infrastructure, APS-VCS operates over existing digital and mobile networks, providing real-time perfectly secret voice communication even in remote or hostile environments.
The war in Ukraine can serve as a fresh and relevant example of the potential application and importance of APS-VCS. The combination of Starlink as a resilient and widely available public channel and APS-VCS as a secure communication system enables military units to maintain command coordination even in the most difficult circumstances without fear of eavesdropping or decryption by adversaries.
Security in covert and clandestine missions. Traditional secure communication devices store pre-distributed secret keys, so if a device is captured by the enemy, the entire encryption system could be compromised. In intelligence, counterinsurgency, or clandestine operations, APS-VCS ensures that no sensitive secret key material is stored or carried by field operatives. If an operative is captured or defected, no secret key information can be extracted to compromise ongoing operations.
Adaptability without the need for pre-deployment. Unlike traditional security systems that require prior key distribution (which can be logistically challenging and risky), APS-VCS allows users to establish secure communications dynamically. This makes it ideal for rapidly changing mission parameters where new communication nodes may need to be integrated without physical key exchanges.
Industrial and corporate security. Businesses dealing with sensitive intellectual property or trade secrets often rely on encrypted communication channels that depend on conventional PKI infrastructure. APS-VCS removes the need for key management through external parties, preventing potential insider threats and security breaches associated with centralized encryption key storage.
Tactical and emergency services. Emergency response teams require secure voice communication systems that function independently of centralized infrastructure, especially in disaster scenarios where conventional networks may be compromised. APS-VCS provides a reliable, autonomous encryption system that ensures complete secrecy of communications between first responders, law enforcement, and crisis management teams.
By addressing these practical applications, APS-VCS demonstrates clear advantages over existing cryptographic methods, particularly in scenarios where infrastructure independence, perfect secrecy, and real-time secure voice communication are crucial.
8. Conclusions
The paper presents a perfectly secret voice communication system based on a MELPe vocoder with a speed of 1.2 kb/s and Vernam’s cipher. The generation and distribution of secret keys rely on two sources of common randomness and the SKD protocol, which requires the use of an additional authenticated channel. The primary source of CR is a specially designed LP synthesizer, which ensures the required amount of innovative entropy based on a local source of randomness. By selecting appropriate security margins and synthesizer parameters, such as sample rate and SNR, the system designer can achieve the desired secret key rate of 1.2 kb/s, thereby maintaining the perfect secrecy of the Vernam cipher. The maximum security margin reaches approximately 460 bits, ensuring negligible leakage of the generated secret keys.
The requirement for an additional reliable authenticated public channel can be seen as the trade-off for achieving perfect secrecy. However, the asynchronous operation of the voice coder and cipher system in the main channel relative to the system components utilizing the public channel significantly mitigates implementation challenges, enhancing practical usability. An experimental evaluation of the vPCP-V system demonstrates the robustness of this approach across various communication channels, including the hardest one: GSM 3G and VoLTE, achieving an acceptable BER on the order of .
While the proposed APS-VCS system meets the stringent requirements of perfect secrecy and autonomy, several directions for enhancement and integration with emerging technologies warrant further exploration. Let us mention just a few of the most interesting, such as the optimization of the SKD protocol, multiplexing of the main and public channels, and refining synthesizer models by exploring alternative excitation methods (such as neural vocoder-based synthesis, for example). A particularly promising direction for future research is hybrid architectures that combine SKD with QKD. While QKD provides provable security guarantees, its current deployment limitations (e.g., infrastructure requirements) make direct substitution challenging. However, incorporating QKD-based key exchange as an additional security layer for key refreshing could significantly enhance long-term robustness.
By addressing these future directions, the APS-VCS system can continue to evolve as a promising solution in the domain of perfectly secret voice communication. The findings presented in this paper contribute to the broader domain of secure speech transmission and highlight the potential for further innovation in autonomous cryptographic communication systems.