1. Introduction
Several mathematicians, since the work of Euclid, have been trying to uncover the mysteries behind prime numbers as they have a unique property of being divisible only by themselves and one [
1,
2]. The use of large prime numbers in providing information security in this digital age has triggered much research in this direction. With the advent of Rivest–Shamir–Adleman (RSA) encryption system in 1978, prime numbers are being combined innovatively to create cryptographic keys to allow secure transmission of private and sensitive information over computer networks [
3,
4]. Higher security can be enforced with larger prime numbers since prime factorisation is extremely hard and the RSA system takes advantage of this elegant property [
5,
6]. However, using very large prime numbers for RSA involves more computational time in encrypting and decrypting the information, which needs to be balanced for real-time applications. With this limitation, malicious attacks target on breaking the RSA system by finding efficient methods of prime factorisation [
7,
8].
Another advancement in this digital age is the evolution of the Internet of Things (IoT) that connects intelligent devices to work together in providing new personalised capabilities of products and services. However, the IoT has limited computing capabilities, storage and connectivity. In this context, the greatest challenge is in securing IoT devices as well as the confidential communication of information over the IoT network [
9]. In such an environment, the cryptographic algorithms are appropriately scaled down and the smaller prime numbers used in the encryption keys can provide more scope for hackers to perform their attacks [
10]. Implementing information security capabilities involves several approaches to protect confidential data such as: (i) off-chip cryptographic memories to store sensitive information, (ii) cryptosystems such as symmetric and asymmetric cryptography, and (iii) hardware-level authentication of peripherals. In many situations, efficient and faster prime factorisation method facilitates in breaking the security algorithm in real-time, which serves as a test for establishing the security limits of the computing systems from any possible attack.
In recent years, several prime factorisation methods have been proposed, improving their efficiency to factor composite prime numbers (semi-primes) as large as 250 decimal digits utilising sufficiently large computing power [
11,
12]. However, semi-prime factorisation still remains a challenge that draws interest from the perspective of research in computational number theory as well as the practical difficulty of cracking RSA keys used in cryptosystems [
13,
14].
In this paper, we consider the application of prime factorisation for testing the security of RSA cryptography, which is based on a positive integer N, where the encryption and decryption of any message using a pair of public and private keys depends on N. In the RSA algorithm, N is a product of two prime numbers (
) and is a semi-prime [
15]. In the secured transmission of a message,
and
are employed in RSA to generate the key pairs for encryption and decryption. If
and
are known, then the cracking of the RSA keys becomes possible [
16]. Hence, the security of RSA depends on how difficult the factorisation of
N is. This motivates research works to propose new factorisation methods. Euler’s factorisation is the most popular method that is well suited for finding prime factors of semi-primes whose constructions are based on Pythagorean primes [
17]. We identify the limitations of Euler’s method as it is applicable to only semi-prime constructs that are Pythagorean primes. Our aim in this paper is to propose an improved method by considering the parity of the squares approach. We provide a proof theory that our proposed method requires much fewer steps for the factorisation process of RSA modulus
N. Hence, our enhanced method could be applied to test for factorisation attacks that would provide insights into choosing the key size and the time period until which an RSA-based public key algorithm is safe from an attack.
2. Theory and Proposed Method
The definition of a generic Pythagorean triple is denoted as the triple:
where
with
, i.e.,
and
denote the sides of a right triangle, and
denotes the length of the hypothenuse [
18]. From the fundamental property of right-angled triangles, we have,
which is among the Diophantine equations [
19]. The set of all Pythagorean triples is denoted by P.
For every
with the series of odd and even Pythagorean triples defined in terms of
m and
n, it has been proved in previous work by Overmars et al. [
17] that:
The above is referred to as the Overmars triangles, and we are interested in the properties of the hypotenuse of the triangle in this paper. As commonly represented, let us denote the hypotenuse in this paper by , which could be represented as . From this equation, if is odd, then is also odd and it follows that will be even and will also be even. Conversely, if is even, will be even and both and will be odd.
Fermat’s Christmas theorem [
20] showed that for a Pythagorean prime
. This was extended by Overmars [
19] taking into consideration the parity of
and
such that
, noting that for a particular Pythagorean triangle, the sides making up the hypotenuse where opposite in parity. For a semi-prime consisting of two such triangles whose hypotenuse are prime, it will be shown here that its two sums of two squares will have the following parity:
If we consider the following differences:
It can be shown that one of the primes
can be represented as:
Euler’s factorisation method is suited to semi-primes whose construction are prime factors that are said to be Pythagorean [
21,
22] and can further be improved upon by considering the parity of the squares. The limitations of this method pertain only to semi-prime constructs that are Pythagorean primes (
). It can also be shown (
Section 3) that the combinations of Pythagorean primes with Gaussian primes cannot be represented as the sum of two squares. The implication here is that if the semi-prime construction selects Pythagorean and/or Gaussian primes randomly, only one quarter of the semi-prime constructions avail themselves to this factorisation method. The distribution of Pythagorean and Gaussian primes appear in the set of natural numbers with equal probability. A comprehensive description on the Pythagorean and Gaussian primes and their probabilistic distributions are provided by Oliver Knill [
23].
Consider a semi-prime number whose construction consists of primes with being Pythagorean, and having a representation on the Cartesian plane such that, . It can easily be shown that the product of two such primes can be represented as the sum of four squares from which two sums of two squares can be derived. For such a semi-prime, if the original construction is unknown and the sum of four squares is known, the original construction can be found. This paper considers the sum of four squares from which two sums of squares is determined, and hence by Euler’s factorisation the original construction can be found. By considering the parity of each of the squares, a new way of determining the semi-prime construction is described. Our proposed method provides an alternative to Euler and uses Overmars triangles. This exploits the relationship between the four squares, from which the two sums of two squares can be determined by considering each squares’ parity, and thereby the factorisation is determined. We describe the Euler’s factorisation forming the foundation of our proposed method and the related proofs in the next two sections of the paper.
3. Euler’s Factorisation Method
We begin by considering Gaussian primes and Pythagorean primes. From the literature, Gaussian primes are of the form [
24,
25]:
and Pythagorean primes of the form:
According to Fermat’s Christmas theorem on the sum of two squares, we have the following:
Gaussian primes are of the form and are not representable as the sum of two squares.
Proposition: A semi-prime whose prime factors are Pythagorean can be expressed as the sum of four squares, from which two sums of squares can be derived.
Lemma: A semi-prime,, is expressed as the sum of four squares, such that: Proof: Euler’s factorisation
Let us consider the method of Euler’s factorization, where a number (
N) can be factored by writing it as a sum of two squares in two different ways as follows:
Let us consider the example
= 2137458620009 to find
and
, using the sum of squares as follows:
combining even and odds we get:
using the greatest common divisor (gcd):
The above example illustrates how the semi-primes of N can be derived as the sum of two squares using Euler’s factorisation method.
Now, express the sum of four squares as two sums of two squares.
Let
■
4. Proposed Semi-Prime Factorisation Using Sum of Squares
Overmars et al. [
17] showed that all Pythagorean triples could be represented as
. If the semi-prime is constructed using two Pythagorean primes (
) then two representations as the sum of two squares can be found and Euler’s factorisation method can be applied. Finding these two representations is non-trivial and computationally intensive for large numbers even with computers with a high performance central processing unit (CPU). The equation
provides an elegant search using increments of
and fine convergence using
, and the CPU-intensive square root can be avoided. In this way
is incremented and
is decremented about
to find one of the two solutions along the diagonal of a field of
. It can also be shown (as a future work) that once one sum of the squares is known, this can be used to find the other.
Consider the example of a large number,
.
For completeness,
can be represented as two Pythagorean triangles as shown [
2]:
Once the two sums of two squares have been found, Euler’s factorisation method can be used. .
5. Proposed Method Using Gaussian and Pythagorean Primes
According to Fermat’s Christmas theorem, if Pythagorean primes (
) are used to construct a composite of the semi-prime number (
), a solution exists as two sums of two squares. However, if
is constructed using Gaussian primes (
), then Euler’s sum of two squares method cannot be used [
26,
27]. There is a lack of research in this direction [
28,
29]. This motivates us to investigate in this paper, if there is a test case which we can use to see if a composite of the semi-prime number has been constructed using Pythagorean primes.
Consider the following composite constructions:
- (i)
using Pythagorean primes;
- (ii)
using Gaussian primes;
- (iii)
using mixed Pythagorean and Gaussian primes.
(i) Pythagorean prime construction
We have verified that two sums of two squares representations exist and Euler’s factorisation can be used.
As an illustration, consider the following example for
Note the parity of the sum of four squares is (odd, even, even, even).
(ii) Gaussian prime construction
Sums of three squares exist .
As an illustration, consider the following example for
(iii) Mixed Pythagorean-Gaussian prime construction
Sums of four squares exist.
.
Note the parity of the sum of four squares is (even, odd, odd, odd).
In summary, a semi-prime whose composite construction is based upon both Pythagorean and Gaussian primes can easily be identified when
is true and the sum of four squares parity is (even, odd, odd, odd) and Euler’s factorisation cannot be used.
Table 1 provides possible composite constructs of a semi-prime number using Pythagorean and Gaussian primes as the factors. When
is true, the composite could be constructed using Pythagorean primes or Gaussian primes. When the Pythagorean construct is confirmed, we can verify that: (i) the sum of four squares parity is (odd, even, even, even), (ii) the two sums of two squares can be found, and (iii) Euler’s factorisation can be employed.
Proof: Let
N be a semi-prime and
and
are its two prime factors so that
. Assume also that
and
are distinct. Suppose that the primes
and
are “Pythagorean” (2-square), that is, they can each be written as the sum of two squares of natural numbers:
,
, then:
Therefore,
is also Pythagorean, and can be represented as the sum of two squares in two different ways:
The problem is rephrased as:
Given
, is known.
Find
and
.
then the factors of
N are:
■
6. Verification with Ordering Ambiguity
In this section, we verify that the ordering of the odd and even pairs does not affect the results.
Consider the following example 1 of ordering of odd and even pairs as follows:
Consider the following example 2 of ordering of odd and even pairs as follows:
Furthermore, we have additional information which can assist in removing this ambiguity. If we consider odd and even pairs when ordering the sums, we can use Overmars triangles to conserve parity and remove this ambiguity.
Consider the following form:
When .
Conversely, .
Odd/even or even/odd parity is thus assured and preserved for each of the sums of squares and this additional information can be used to remove the ordering ambiguity.
Consider the difference between odd and even parts of the two sums of two squares, this removes the ordering ambiguity:
one of the primes
can be given as:
Proof: Express
as Overmars triangles.
Let us express two semi-primes
as Overmars triangles as follows:
where
.
Substitute for
⇒
■
Let
and consider the parities given in
Table 2.
From Equation (1) , observe from the table the parities, recalling (),
,
,
7. Application of Proposed Method for RSA Factorisation
Historically, an RSA algorithm was experimentally tested using brute force attacks by trying all possible secret keys (public and private keys). When RSA employed shorter keys for encryption and decryption, they became easier to identify using brute force attacks [
15]. Larger keys could escape from brute force attacks since they are exponentially more difficult to crack and hence, the key length is an indicator of how a brute force attack is practically feasible. Therefore, the strength of an RSA cryptosystem is measured theoretically by determining how many steps it would take for a brute force attack to crack the keys. However, with greater computations involved with larger keys, and as the encryption and decryption algorithm takes much larger time, there is a limitation on the key length used for practical applications. Hence, different factoring algorithms and faster cryptosystems have been researched [
29,
30,
31]. Different implementations of the modular exponentiation have resulted in the timing variations of the attacks used for performing an RSA attack. In other words, for an encrypted message or cipher text
it is the ability to find
by determining the time taken to compute
.
Another method is to perform factorisation attacks mathematically by factoring the modulus
, which forms the underlying structure of an RSA function [
15]. While there are several factorisation approaches, the common goal is to factor the semi-prime,
. The encryption algorithm selects the secret keys
and
to calculate the public key
. The decryption algorithm then factors
to obtain the keys
and
. Hence, new factorisation algorithms have been mathematically derived in generating public and private keys of an RSA algorithm, however, they take a long period of time for the factorisation of
when the keys
and
are very large. The first RSA number successfully generated in 1991 was RSA-100 and subsequently up to RSA-500. They were labeled according to their key size, namely number of decimal digits occupied when implemented in the computer. While factorisation of RSA-617 was successful before RSA-576, many of the bigger numbers have still not been factored. Hence, the factoring challenge was introduced to give an insight into which key length is safe and for how long so that applications could choose the key length for their RSA encryption algorithm ensuring security until it is proven to be safe. This forms the motivation for researchers to mathematically prove RSA factorisation time limits that help in the understanding of the cryptanalytic strength of commonly adopted cryptosystems in practice. Hence, in this paper, we focus on proposing a new factorisation method that is efficient in terms of the speed of factorisation. We demonstrate the application of our proposed factorisation method for RSA-768 as shown in
Table 3.
In general, the computational complexity of a factorisation algorithm can be measured by the number of operations it performs [
32]. Factorisation algorithms are also interesting from the computational complexity viewpoint. Since the existing algorithms are not able to solve the factoring problem in polynomial time, encryption keys are developed as they cannot be factored in a reasonable amount of computer time [
33]. Overall, our proposed factorisation method provides a faster alternative to the commonly used modulus exponentiation methods and Euler’s method, by exploiting the relationship between the four squares. The implementation of our method requires a constant amount of clock cycles for completing such simple arithmetic operations and hence requires a reduced computational complexity. The mathematical proof of our new factorisation method and its demonstration for generating RSA-768 have been established. Many of the bigger prime numbers are yet to be factored and are expected to remain unfactored for quite some time. However, such beliefs can be challenged when new factorisation techniques and new technologies are introduced. Since encryption schemes are being used today to protect financial and other confidential data, ways and means of developing a single quantum computer to factor very large primes quickly and in parallel are under consideration. Shor’s quantum algorithm developed in 1994 depends on a computer with a large number of quantum bits to calculate the prime factors of a large number. A large prime number, with even 232 digits, could take more than two years to factor using hundreds of computers working in parallel. Hence, a major breakthrough in technologies such as quantum computers along with the innovation of Shor’s algorithm and other work, including ours, could make this problem domain an interesting space for academic researchers and industry practitioners to explore further.
8. Conclusions and Future Work
In this paper, we proposed a new method for semi-prime factorisation which forms a cornerstone for security in RSA cryptosystems. By exploiting the relationship between a set of four squares, we provide a relatively simple, fast and scalable factorisation method that is computationally more efficient than the existing, commonly used modulus exponentiation methods and Euler’s method. The mathematical proofs behind the development of our simple and reliable algorithm for semi-prime factorisation were presented. In addition, the application of our method to factorise large semi-primes for generating RSA-768 was established.
Our work in this paper forms the backbone in creating new research opportunities. With new technologies such as IoT, blockchain and quantum computers evolving, future work would involve exploring our factorisation method in various cryptographic protocols within such new computing paradigms.