1. Introduction
In modern cryptography, researchers are increasingly exploring non-abelian group-based cryptosystems, due to their intricate algebraic structures and the perceived potential for heightened security against quantum computational methods. This exploration extends beyond traditional cryptographic assumptions like factorization and discrete logarithm problems (DLP), addressing one-way trapdoor functions in non-abelian groups. The emergence of Shor’s algorithm, capable of efficiently factoring integers and computing discrete logarithms, underscores the vulnerability of classical hardness problems to quantum attacks. This motivates the pursuit of post-quantum hardness assumptions in cryptosystems.
Baumslag et al. presented a group-theoretic learning challenge termed learning homomorphisms with noise (LHN), generalizing the established hardness assumptions, notably learning parity with noise (LPN) and learning with errors (LWE). LWE establishes a quantum hardness assumption rooted in lattice-based cryptography, forming the foundation for diverse constructions in modern cryptographic systems. It asserts the computational challenge of learning a random linear relationship between secret information and noisy data within this lattice-based cryptographic paradigm [
1,
2,
3,
4,
5,
6,
7]. The LWE hardness assumption is fundamentally based on abelian integer groups. However, our study centers on the LHN associated with the non-abelian Burnside groups
and
, where
, commonly referred to as the learning Burnside homomorphisms with noise (
-LHN) [
8,
9]. In this context, the
-LHN hardness problem focuses on recovering the homomorphism between Burnside groups
and
based on probabilistic polynomial sample pairs of the preimage and distorted image. Several aspects related to the security and cryptography of the
-LHN problem, such as random-self reducibility, error distribution, and symmetric cryptosystem, have already been extensively studied. The paper by Pandey et al. [
10] extended existing research by introducing derandomization of the
-LHN assumption, resulting in a new assumption, termed the
-LHR assumption. Furthermore, the paper discussed the design of a length-preserving weak PRF based on the
-LHR assumption, leading to a PRF construction. However, the construction of a PRF from the derandomization of the
-LHN assumption appears to be less efficient in terms of both secret-key size and performance compared to the direct PRF construction from the
-LHN assumption proposed in this study.
The pseudorandom function (PRF) and pseudorandom generator (PRG) constitute fundamental constructs in theoretical computer science, with implications spanning cryptography, computational complexity theory, and related domains. A PRG is defined by a deterministic algorithm taking a uniformly sampled seed as input, aiming to extend it into a longer sequence mimicking randomness, indistinguishable from a truly random sequence for any probabilistic polynomial time (
) adversary. Formally, a deterministic function
with a sufficiently large security parameter
and
is considered a PRG if no efficient adversary can distinguish the polynomial outputs
from truly random outputs [
11]. Similarly, a PRF is a deterministic mathematical function defined by a uniformly sampled secret-key, producing outputs with random-like characteristics. Despite its deterministic nature, the PRF output depends on both the uniformly sampled secret-key and the adaptive input. For a PRF defined by a uniformly sampled secret-key, it is computationally infeasible for a
adversary to distinguish between oracles with pseudorandom outputs and truly random outputs. A well-defined PRF family facilitates easy sampling of functions and efficient evaluation for a given secret-key and adaptive input. The adaptive power conferred to the
adversary makes designing a PRF challenging. Our primary objective is to construct a PRF family based on a post-quantum hardness assumption, specifically the
-LHN. This study adheres to the standard PRF definition from [
12,
13,
14] for PRF constructions based on the
-LHN assumptions. Consider a pseudorandom function (PRF) family denoted by
with a sufficiently large security parameter
. A function
in
is defined by a secret key
. For a uniformly sampled secret key
k, the function
is deemed a PRF if no
adversary can distinguish the polynomially many outputs
from truly random outputs. The adversary is granted the ability to make adaptive queries to the inputs
.
PRF designs can be broadly categorized into two approaches: theory-based and heuristic-based. The heuristic-based approach relies on practical heuristics to design a PRF family, exemplified in the construction of Rijndael’s AES [
15,
16]. While heuristic-based designs are often efficient and practical, their security lacks rigorous justification. Conversely, the theory-based approach employs well-established hardness assumptions to construct a PRF family with justified security. The foundational exploration of PRF concepts began in the seminal work of Goldreich, Goldwasser, and Micali (GGM) [
12]. GGM significantly contributed to pseudorandomness by establishing a critical link between PRGs and PRFs. They introduced the use of a length-doubling PRG as an intermediate function in constructing a PRF.
The outline of the PRF construction proposed by GGM is as follows: Let
be a length-doubling PRG, where
is a sufficiently large security parameter. The output
is split into two equal halves, denoted as
and
, representing the left and right halves, respectively. For a PRF family
with a sufficiently large security parameter
, a PRF
in
with secret key
and an
m-bit input
is defined as in Equation (
1). The PRF construction in GGM follows a sequential approach employing PRGs. It necessitates
m invocations of the PRG to calculate the output
. The primary advantage of GGM’s PRG-based PRF construction lies in the utilization of the secret-key as the seed in the initial PRG invocation.
In the paper [
17], NR proposed a groundbreaking design for a length-doubling PRG utilizing the decisional Diffie–Hellman (DDH) assumption. Moreover, there is a positive catch in the design, and one can employ this property for many other aspects. The length-doubling PRG utilizing the DDH assumption is defined as follows: For sufficiently large primes
P and
Q, where
Q divides
, let
g be a generator in a subgroup of
(the multiplicative group modulo
P) of order
Q. The DDH assumption holds if no
adversary, given
, distinguishes the outputs
and
with non-negligible advantage. Here, the exponents
a,
b, and
c are sampled uniformly from
. Utilizing the DDH assumption, NR designed a length-doubling PRG
G with index
, as in Equation (
2).
Upon initial examination, an apparent paradox arises as
appears to defy efficiency, due to the presumed computational complexity of the Diffie–Hellman (DH) problem. If the function
were indeed efficiently computable, it would not qualify as a PRG, given the assumed complexity of the DH problem. However, a distinctive attribute of
comes to light, rendering it suitable for incorporation into the GGM construction of a PRF. Specifically,
demonstrates efficient computability when either exponents
a or
b are known. The key idea here is to use exponent
a as an index in
from a secret-key of the resultant PRF. A PRF
utilizing a length-doubling PRG
is defined as shown in Equation (
3).
Here, the secret-key
k is defined as
for
. The construction of these length-doubling PRGs
relies on the DDH assumption, where the function is defined based on the secret-key components of a PRF. It is important to note that the security of the length-doubling PRG
is inherently tied to the security of the DDH assumption. Independently, if not used in the PRF construction, the length-doubling PRG is not efficient unless the DDH problem is easy. However, if used as an intermediate function in a PRF construction, the function acts like a PRG and can be used to construct a PRF.
Contribution In this study, we make the following contributions to the field of cryptography: First and foremost, to address the efficiency of cryptographic protocols based on the
-LHN assumption, we introduce an optimized and parallelizable concatenation operation tailored for Burnside groups. Moreover, we introduce and formulate three progressively refined designs for constructing a PRF family using the GGM approach, rooted in the
-LHN assumption. In the first attempt, a PRG
, for the given homomorphism
from
to
,
, and
, was defined as
For
,
with
and
. The design above, termed as a direct PRG, applies the
-LHN assumption by capitalizing on the lower entropy of a set of errors
E compared to a Burnside group
. Moreover, we introduce an adjustment to the direct PRG design, leading to a significant decrease in the secret-key size of a corresponding PRF. We call the design parameterized PRG. However, the modified construction of parameterized PRG introduces extra public parameters. In the second attempt, a PRG
was defined as
For
,
for
. Furthermore,
is a public parameter associated with an error
. We further propose a modification to the parameterized PRG that yields a significant decrease in the secret-key size of a PRF. The construction is detailed as follows: Let
be a homomorphism from
to
. An indexed PRG
with index
is constructed as
where
and
e is sampled from a set of errors
E. Furthermore,
a is sampled from a Burnside group
and is a public parameter associated with the input seed
e. Here, the input and output bit size to the function
are the entropies of a set of errors
E and a Burnside group
, respectively.
Following the GGM construction, we design a PRG-based PRF construction from the aforementioned PRG family, as follows: Let
be a secret-key, where
,
. Let
be an indexed PRG, as defined above. For
, let
represent a set of public parameters, where
is sampled uniformly from a Burnside group
. A PRF
for input string
and secret-key
k is defined as
where the
ith iteration of a function call
uses an associated public parameter
for
. Note,
and
represent the equal left and right half of the output
. Finally, we establish the security of a PRF construction where an indexed PRG
is used as an intermediate function.
Outline Section 2 introduces the concept of a relatively free group, with a specific focus on the Burnside group. The section provides an in-depth exploration of the
-LHN hardness assumption, elucidating its significance. Furthermore, it outlines the construction framework for minicrypt, incorporating Burnside learning problems. The section delves into the clarification of error distribution, a pivotal component for establishing a post-quantum hardness assumption known as
-LHN. We reference a derandomization technique for the
-LHN assumption and the construction of a pseudorandom function (PRF) utilizing the
-LHR assumption from [
10].
Section 3 presents an optimized concatenation operation within the Burnside groups
and
for
, emphasizing parallel efficiency.
Section 4 explores three distinct approaches to constructing a pseudorandom function (PRF) from an original
-LHN assumption without derandomization. Within this context, the section introduces designs for the fundamental primitive: pseudorandom generator (PRG). The section investigates how the PRG-based PRF design significantly reduces the secret-key size compared to alternative designs from the modified
-LHR assumption.
Section 5 provides a comprehensive analysis of the security and efficiency characteristics of our proposed PRG and PRF schemes.
2. Background
Burnside [
18], in 1902, put forward the question of whether a finitely generated group all of whose elements have finite order is necessarily finite. After six decades, the question was addressed by Golod and Shaferevich by finding an example of an infinite finitely-generated group all of whose elements have finite order [
19]. In their paper, Golod and Shafarevich showed that the order of the group is infinite for a number of generators greater than 4380 and odd. In 1975, Adian improved the result, showing that a group is infinite for a number of generators greater than 664 and odd [
20]. A
free Burnside group with
n generators and exponent
m, denoted by
, is a group where
for all elements
w. Clearly, it is easy to visualize that the group
is abelian and it has order
. In the original paper, Burnside proved that the order of
is finite. Later, Levi and van der Waerden showed the exact order of
to be
, where
[
21]. Furthermore, Burnside also showed that the order of
is
, and later Sanov enhanced the result by showing that the order of
, in general, is finite but the order is not known [
22]. Similarly, for order
, Marshall showed that
is finite and the order of
is
,
,
, and
[
23]. In
, for an exponent
m other than
, it is unknown that
is finite for all
n generators.
Notation Throughout our discussion, the following conventions are consistently applied: The symbols and signify a security parameter and the set of natural numbers, respectively. The term log is used to denote the binary logarithm. For a set S, the notation indicates that a is sampled uniformly from S. Similarly, for a distribution over a set S, denotes that a is an element in S sampled according to the distribution . The notation represents the bit-strings resulting from the concatenation of strings , which may have different lengths. However, in an algebraic context, signifies a (relatively) free group G generated by a set of generators X. For some polynomial function and the security parameter , the set denotes a set where is the ith element for .
2.1. Relatively Free Group: Burnside Group
Let
represent an arbitrary set of symbols, where
. Within
, each element
and its inverse
(or equivalently,
) are denoted as literals. A word
w signifies a finite sequence of literals from
. A word
w is considered reduced if all occurrences of sub-words
or
are eliminated. A group
G is termed a free group with a generating set
, denoted as
, if every nontrivial element in
G can be expressed as a reduced word in
. If
N is a normal subgroup of a free group
G, then the factor group
is relatively free if
N is fully invariant. That is,
for any endomorphism
of
G. A Burnside group
is a (relatively) free group with a generating set
, where the order of all the words in
is 3 [
23,
24,
25,
26]. For the (relatively) free groups
and
where
, the universal property holds as follows: for every mapping
, for some (relatively) free group
, there exists a unique homomorphism
(
Figure 1).
The group operation, we shall refer to this as a concatenation operation (·), between words
is to write
and
side by side and generate the reduced word in
. This is denoted by
(or
) for any
. Since the order of
is 3,
for all
. The
empty word is the identity in
and is represented by 1. Each word in
can also be represented in normal form, as in Equation (
4) [
8,
9]. More comprehensive details are provided in the literature [
18,
20,
22,
23,
25,
27,
28].
In the normal representation of a word
w in a Burnside group
,
,
,
are the exponents of generators
, 2-commutators
, and 3-commutators
, respectively. The following example illustrates the transformation of a word in a Burnside group
.
Example 1. This is an example of transforming a word in a Burnside group with a generating set to a corresponding normal representation. Properties associated with commutator words in a Burnside group are discussed in Appendix A. The transformation is as follows, where at each step the bold expression from the previous line is simplified using the underlined transformation in the next line:
The order of a group
is
where
. The abelianization operation, denoted by
, is defined in Equation (
5), which collects all the generators and corresponding exponents in a word
from Equation (
4).
Finitely generated Burnside groups can be geometrically represented using Cayley graphs. The Cayley graph of a Burnside group
, defined with respect to a generator set
, depicts group words as vertices. Edges connect two vertices if a generator’s (or its inverse) multiplication transforms one into the other. The Cayley distance between two words is the shortest path length between their corresponding vertices in the Cayley graph. The Cayley norm of a word is defined as its distance from the identity word in the Cayley graph.
Figure 2 illustrates the partial Cayley graph with essential edges connecting all words in a Burnside group
with a generating set
in breadth-first order.
2.2. Learning Burnside Homomorphisms with Noise
There exists a homomorphism
for any random mapping from a generating set
to a Burnside group
.
denotes a set of homomorphisms from
to
. For each generator in the generating set
, there are
possible mappings where
. The order of a set of all homomorphisms is
. The distribution
represents the error distribution in a set of errors
(Details are illustrated in
Section 2.4). For
, the distribution
defines the outputs
, where
is randomly chosen from
and
with
. On the other hand, the corresponding random distribution
defines the outputs
, where both
and
are chosen uniformly from
and
, respectively. Similarly,
and
represent the oracles with distributions
and
, respectively. The decisional
-LHN problem is to distinguish the oracles
and
with a non-negligible advantage from given polynomial samples. By setting the value of
n, a level of security of
bits can be achieved from the decisional
-LHN problem. The security parameter
is defined as
, where
. Therefore, the decisional
-LHN assumption is formally stated as follows:
Definition 1 (
Decisional Bn-LHN Assumption)
. For any adversary and sufficiently large security parameter λ, there exists a negligible function , such that 2.3. Minicrypt Using Burnside Learning Problem
A secret-key cryptosystem utilizes a single secret-key for both encryption and decryption tasks. This shared key is exclusive to the communicating entities and necessitates a secure channel for its distribution. Mathematical functions within symmetric key algorithms facilitate the transformation of plaintext into ciphertext and vice versa. The utilization of a symmetric cryptosystem based on the decisional hardness of the
-LHN problem is explored in [
8]. To encrypt
t-bit message
m, we define
independent words
in
. Words
and
are independent if the sets
and
are disjoint, for all
. To encrypt the decimal number
m that represents a
t-bit message, ciphertext
is generated, where
and
for error
. A homomorphism
sampled uniformly from
represents a shared secret-key. To decrypt a ciphertext
, we compute
. The plaintext is recovered as
m if the word
is in the set
.
2.4. Error Distribution
The security of the
-LHN learning problem relies on the assumed hardness of group theoretic problems and introduced errors. The introduction of errors contributes to making these problems computationally hard, forming the foundation of the security assumptions. Recall that in the context of the hardness of the
-LHN problem, we define two Burnside groups
and
such that
. The error distribution
in a Burnside group
is generated by concatenating generators from
in random order, accompanied by random exponents from ternary set
[
8,
9,
10].
The probability mass function of errors
is precisely defined as follows [
8]:
In Equation (
7),
is the
ith component of a vector
sampled uniformly from a field
.
is the set of all permutations of a set
. The probability mass function in Equation (
7) generates a multiset with
possible errors in
. The abelianization operation (
) extracts the generators and their corresponding exponents from a word, while discarding any other exponents, as shown in Equation (
5). In a set of samples
with
,
represents an error generated according to the distribution
from a set
E. For randomness in abelianized samples
, an error distribution
is required. The computation
ensures randomness, emphasizing the importance of establishing the error distribution
as defined in Equation (
7) to prevent abelianization attacks on the
-LHN hardness assumption [
8].
Let
,
denotes a multiset of errors defined in Equation (
7). Here,
represents a set of errors with Cayley norm
l. Correspondingly, let
, where
is the associated underlying set of the multiset
. The function
is defined by simplifying an error in
M through multiple concatenation operations in the Burnside group
. The order of the multiset
M is
. Similarly, the order of the subsets
and
is
and
, respectively. Since the function
f maps an error from
to
,
has precisely
preimages for an error in
. In other words,
errors in
constitute different representations for an error in
. Considering
identical errors in
as a cluster, there are
such clusters in
. The straightforward approach to sample errors according to distribution
is to determine
,
as indices and exponents, respectively. However, this approach requires multiple concatenation operations to obtain the simplified error. A bottleneck for cryptosystems based on the
-LHN assumption arises due to the multiple concatenations for simplifying an error. However, achieving a distribution of errors
within an error set
E can be realized through two distinct methods. In the initial approach, we establish a mapping from a multiset
M to an error set
E using multiple concatenation operations. This constitutes a one-time precomputation, serving for all subsequent error computations from set
E based on the distribution
. As a second approach, we assign the subset
an appropriate weight, ensuring that the induced distribution in
M is uniform, a requirement for the
-LHN cryptosystem. By assigning a distribution weight to the subset
as
, we can achieve a uniform distribution of
M, representing the distribution
of
E [
10].
4. PRF Construction
In this section, we design three progressively refined constructions of a PRF family using the GGM approach, grounded in the -LHN assumption. The primary challenge in designing a PRF based on the -LHN assumption lies in managing the errors associated with it. The key idea here, in the PRF designs, is to extract errors from the secret-key of the underlying PRF. The initial PRG design, termed the direct PRG, applies the -LHN assumption by capitalizing on the lower entropy of a set of errors E compared to a Burnside group . In the second approach, we introduce a PRG design, referred to as the parameterized PRG, where the function description of a PRG is derived from the public parameters and secret-key of the underlying PRF. Although this may seem counterintuitive at first, our findings indicate that the intermediate PRG used in the GGM’s PRG-based PRF construction places a less strict requirement on the PRG itself, as discussed in the following outlines. In the final approach, we propose a design called an indexed PRG with a set of public parameters and an index associated with it.
Construction 1 (
Direct PRG)
. Let be a homomorphism in . For some , we define a function with as follows:where each is computed as for . Here, the values are sampled from the Burnside group , while are chosen from the error set E. The input and output bit sizes of G are given by and , respectively. Here, p and denote the entropy of the Burnside groups and , respectively, while represents the entropy of the homomorphism set , and characterizes the entropy of the error set .
Theorem 1. If the -LHN assumption holds and , then the function G from Construction 1 is a PRG.
Proof. The proof follows directly from the -LHN assumption. A function G qualifies as a PRG if its output bit length exceeds its input bit length.
The input bit size of
G is given by
, while the output bit size is
. For
G to be a PRG, it must satisfy
, which simplifies to
Rearranging the inequality, we obtain
Solving for
t, we obtain the required bound:
Since the entropy of the set of errors E (denoted by ) is smaller than the entropy of the Burnside group (denoted by ), we can select a sufficiently large t to overcome the entropy in the input of G. This ensures that the entropy of the output of G exceeds that of its input, thereby establishing G as a PRG. □
Construction 2 (
PRF from Direct PRG)
. (Outline) We construct a PRF by utilizing a length-stretching PRG G from Construction 1 as follows: First, we construct a length-doubling PRG from a length-stretching PRG G by cascading multiple PRGs in series. Second, we define a PRF , for any input and secret-key k, using the GGM approach as follows:The secret-key k is where is sampled from a set of homomorphisms . Moreover, , for . The size of the secret-key k is where . However, a disadvantage of this construction appears to be a notably large secret-key size, even for small values of n. We suggest an adjustment to the direct PRG, leading to a significant decrease in the secret-key size of a PRF. This reduction is considerable, particularly for a large enough n, reducing the key size significantly. This modified construction introduces extra public parameters and is denoted as the parameterized PRG. The construction is detailed as follows:
Construction 3 (
Parameterized PRG)
. Let be a homomorphism in . For some , a function with is defined asFor , and . Furthermore, is a public parameter associated with an error . Here, the input and output bit size of a function G are and , respectively. Claim 1. Length-doubling parameterized PRG is sufficient for a GGM PRG-based PRF approach, as demonstrated in Construction 3 and Theorem 2.
Theorem 2. Let the -LHN assumption hold and . A function G, as in Construction 3, is a PRG if the following holds: for the input seed to a function G, we generate an associated public parameter sampled uniformly from for .
Proof. The proof becomes straightforward using the argument that we are stating the -LHN assumption from a different perspective. To illustrate the proof, let us consider a scenario where a adversary aims to distinguish the oracles with distributions and .
Consider a function G obtained from Construction 3. With G, a distribution produces an output from a secret input for . The secret input is uniformly sampled, that is and for . Additionally, an adversary having access to from its oracle also has access to a set of public parameters sampled uniformly from . Here, for all and . Similarly, the corresponding random distribution is identical to , except that the output is replaced with a randomly generated output for . By utilizing the hybrid argument, the proof is simplified and becomes straightforward from the -LHN assumption. □
Construction 4 (
PRF from parameterized PRG)
. Let be a secret-key where , for , and . Let G be a parameterized PRG, as defined in Construction 3, that uses a set of public parameters for . A pseudorandom function (PRF) , for input string and secret-key k is defined aswhere the ith iteration of a function call uses a set of public parameters for . Note: and represent an equal left and right half of the output . We further propose a modification to the parameterized PRG that yields a significant decrease in the secret-key size of a PRF. The construction is detailed as follows:
Construction 5 (
Indexed PRG)
. Let be a homomorphism in . An indexed PRG with index is constructed aswhere and e is sampled from a set of errors E. Furthermore, a is sampled from a Burnside group and is a public parameter associated with the input seed e. Here, the input and output bit size for a function are the entropies of a set of errors E and a Burnside group , respectively. Claim 2. In particular, for a Burnside group with , an indexed PRG is a length-doubling PRG because the entropy of a Burnside group is roughly twice the entropy of a set of errors E. Furthermore, indexed PRG is sufficient for a GGM-based PRF construction as demonstrated in Construction 6 and Theorem 3.
Theorem 3. Let the -LHN assumption hold, and φ is a homomorphism sampled uniformly from . A function , as in Construction 5, is a PRG if it is used as an intermediate function in a PRF from Construction 6.
Proof. The proof is similar to Theorem 2. Moreover, if is used as an intermediate function, as in Construction 6, the following holds: for the input seed to a function , we generate an associated public parameter a sampled uniformly from a Burnside group . □
Construction 6 (
PRF from indexed PRG)
. Let be a secret-key where , . Let be an indexed PRG as defined in Construction 5. For , let represent a set of public parameters, where is sampled uniformly from a Burnside group . A PRF for input string and secret-key k is defined aswhere the ith iteration of a function call uses an associated public parameter for . Note: and represent an equal left and right half of the output .