2.1. Finite Fields
Finite fields, also known as Galois fields, are a mathematical concept proposed by Évariste Galois in the early 19th century for the study of solutions to algebraic equations [
27]. These fields have widespread applications in cryptography, coding theory, computer science, and combinatorial mathematics. In abstract algebra, a field
F is defined as a set closed under basic operations (addition, subtraction, multiplication, and division), where each element must have an additive inverse and a multiplicative inverse, and every non-zero element must have a multiplicative identity. If the number of elements in field
F is finite, then the field is defined as a finite field. The structure and properties of these fields play a key role in understanding and solving various mathematical and engineering problems [
28].
Generally speaking, the order of a finite field refers to the total number of elements it contains, and it is always in the form of a power of a prime number p raised to n, denoted as , for example, finite fields of order or . Finite fields are commonly represented as , where GF stands for Galois field, p is the prime base, and n is the highest degree of polynomials in that field. In , each element can be represented as a polynomial of degree less than n, with coefficients from the smaller field . Moreover, forms an n-dimensional linear space over , with a basis consisting of a set of n-th degree polynomials , thereby allowing every element in to be represented as a linear combination of these basis elements. For instance, is a field containing eight elements, constituted by the third power of the prime number 2. In this field, each element can be represented as a polynomial of degree less than 3 over , where contains two elements . For example, a specific element might be expressed as , where take values from , that is, either 0 or 1. Therefore, possible elements include and , etc.
In the finite field , the additive identity is defined as 0, while the multiplicative identity is 1. For any element a in the field, its additive inverse is defined as , such that . Similarly, for any non-zero element b, its multiplicative inverse satisfies . All addition and subtraction operations in this field follow modulo p arithmetic, i.e., and . Multiplication and division also follow the same modulo p arithmetic, respectively represented as and .
In particular, when , i.e., for the field , addition and subtraction operations are equivalent to the XOR operation. For example, in , the operation is equivalent to in binary, resulting in 0. At this point, elements in the field can be represented as binary numbers; for instance, the polynomial corresponds to the binary number . Due to the widespread use of binary numbers in computers, is particularly important in cryptography, as it corresponds to 8-bit binary numbers (i.e., 1 byte).
Furthermore, the concept of a generator in finite fields is crucial for understanding the structure of the field. A generator is a special element in the field whose powers can generate all non-zero elements of the field. In the field , the number 2 often acts as a generator. For example, in , considering 2 as a generator, its powers will generate all non-zero elements of the field, such as , up to . These elements can be represented as polynomials in and presented in the form of three-digit binary numbers.
In the finite field
, the application of the polynomial generator
g is a core concept. Any element
a can be expressed as
, where
k is the exponent. Taking
as an example, this field contains 16 elements, generated by the primitive polynomial
, with the other 14 elements generated by this polynomial. In this context, since
g cannot generate the polynomial 0, there exists a cyclic period, which is
. Thus, when
,
k can be simplified as
.
Table 1 shows the relationship between the generated elements of GF(2
4) and their corresponding polynomial, binary, and decimal representations.
In the GF(2
w) field, multiplication operations can be simplified to addition operations. Specifically, if
and
, then the following holds true:
Using a lookup table method, one can find the corresponding
n and
m values for
a and
b, respectively, and then calculate
. Moreover, the relationship between elements in the field and their exponents can be constructed and utilized through forward and reverse lookup tables. In the context of GF(2
4), these tables are referred to as gflog and gfilog, used for mapping binary forms to polynomial forms, and vice versa.
Table 2 is the forward and reverse table of exponents and logarithms for
.
For example, in GF(2
4), multiplication and division calculations can be performed according to Equations (
2) and (
3).
Specific examples include the following:
and
Through this method, computational efficiency is significantly improved, which can be quantified by algorithm complexity theory. Specifically, unoptimized direct multiplication or division typically involves polynomial multiplication or finding the multiplicative inverse, with an algorithm complexity of polynomial level, denoted as , where n represents the size of the operands, and k is a constant greater than 1. Conversely, the multiplication and division using the lookup table method primarily rely on lookup operations and simple addition or subtraction, both with a constant time complexity of . Therefore, the overall time complexity of the lookup table method is also . The efficiency of this method becomes particularly significant when dealing with large-scale data, significantly reducing the original polynomial time complexity to constant time complexity, thus achieving a substantial improvement in computational efficiency.
2.2. Cauchy Reed–Solomon Codes
Reed–Solomon (RS) encoding, based on finite field arithmetic, is a type of error-correcting code widely used in data communications and storage systems, such as CDs, wireless communications, and data center storage. It can correct or recover errors caused by noise, data corruption, or loss [
29]. In RS encoding, given
k data blocks
, the encoding process involves generating
m additional parity blocks
, forming an RS
code, where
. This coding method is particularly suited for fault tolerance because it can recover the original data from any
k blocks out of the
n original and parity blocks.
A key step in RS encoding is constructing an
encoding matrix
. This matrix contains the coefficients used to generate parity blocks, and its design must ensure that the vectors in the matrix are linearly independent, meaning that any combination of
m rows can generate an
m-dimensional space. This ensures that the rank
. By calculating Equation (
4), we obtain the parity blocks
P, which are stored or transmitted along with the original data blocks for subsequent error detection and correction.
Decoding in RS encoding is a complex process that involves recovering the original information from the received data blocks. With at most
m data or parity blocks lost, decoding requires at least
k data or parity blocks. This process typically involves constructing a new
decoding matrix
, which is extracted from the original encoding matrix
and the identity matrix
I. By calculating Equation (
5) (where
is the selected
k blocks), the original data
D can be recovered. This process requires the inverse matrix of
to exist, which is a condition that RS encoding design must satisfy.
Next, we prove the invertibility of the selection matrix. For an
Cauchy matrix, it is defined as follows: Let
A and
B be two sets of elements in the finite field GF(2
w),
,
, for all
, all
, with
, and no repeated elements in sets
A and
B. The construction of the Cauchy matrix is shown in the following Equation (
6):
First, we multiply row
i of matrix
X by
, and the original expression can be written as
, where
is an
n-order determinant, its
element being
. We intuitively look at the following Equation (
7):
It is noted that if
, then two rows are the same; if
, then two columns are the same. Therefore,
, implying that
contains factors
,
. In the expansion of
,
appear
times, as shown in the following Equation (
8):
To determine the value of
k, let
, noting that, except for elements on row
i and column
i, all other elements are 0, making it a diagonal determinant, the value being the product of the main diagonal elements. At this time,
, where
, showing that
. Thus, Equation (
9) can be derived as follows:
For an n-order Cauchy matrix, as long as , the determinant of the matrix is non-zero and greater than 0, thereby ensuring the matrix’s invertibility. More importantly, any m-order submatrix () of the Cauchy matrix also satisfies the conditions of non-zero determinant and invertibility, providing assurance for the matrix’s applicability and flexibility.
Additionally, it is worth noting that standard RS encoding may require expensive computations during decoding, especially when solving the inverse matrix. To optimize this process, Cauchy RS encoding was proposed. Initially, Cauchy RS encoding [
30] replaced traditional Vandermonde RS encoding, solving the problem of high computational complexity of inverse operations, reducing it from
to
. This improvement significantly speeds up the decoding process, especially in applications dealing with large volumes of data. Subsequently, Cauchy RS encoding employs binary representation in the finite field GF(2
w), simplifying multiplication operations to efficient XOR operations, further reducing computational complexity and enhancing processing speed. Finally, by applying optimized Cauchy RS encoding to FPGA (Field-Programmable Gate Array) hardware, leveraging its high customizability and support for parallel processing, binary XOR operations can be efficiently implemented in FPGAs, thereby significantly improving throughput and reducing latency in large-scale data processing and high-speed communication systems. These optimization steps not only reflect the evolution of RS encoding theory to practical application but also innovate at each step towards improving encoding and decoding efficiency and reducing computational complexity, enabling RS encoding to more effectively adapt to modern high-speed and large-volume computational environments.
In the scenario depicted in
Figure 1, we observe the process of Reed–Solomon (RS) encoding for
,
,
by converting a specific encoding matrix into its binary matrix form. In this binary matrix, gray blocks represent 1, while white blocks represent 0. The encoding operation involves multiplying a 1 × 12 binary vector with a 12 × 6 binary matrix.
In this process, the computational cost of multiplication mainly depends on the number of 1s (gray blocks) in the binary matrix. According to research by Plank et al. [
31], for
, optimizing the encoding matrix can reduce the number of 1s in the binary matrix from the original 46 to 34. This optimization significantly reduces the complexity of multiplication operations while maintaining the efficiency and reliability of encoding. This method’s implementation is relatively simple, requiring no additional complex optimization measures, and is an effective way to optimize Reed–Solomon encoding in modern communication and storage systems.
2.3. Acceleration of Data Recovery in Distributed Storage Systems
In distributed storage systems, erasure coding (EC) plays a crucial role, particularly in data recovery. The core challenge of erasure coding lies in balancing network resource consumption, encoding and decoding computational efficiency, and disk I/O overhead [
32]. With the introduction of high-performance networking technologies such as InfiniBand and RDMA [
6,
7], the network bottleneck that existed in data recovery is gradually being resolved, pushing the decoding efficiency to a critical position in constraining data recovery efficiency. In the decoding process, the complexity of a large amount of finite field matrix multiplication operations makes the optimization of hardware acceleration schemes a key consideration point.
Among the specific types of erasure codes, RS (Reed–Solomon) codes are the most widely used. They are characterized by having the MDS (maximum distance separable codes) property and meeting the singleton boundary conditions [
33], thereby achieving the theoretical optimal storage utilization rate. Furthermore, regenerating codes, including their two main branches, minimum storage regenerating (MSR) codes [
34] and minimum bandwidth regenerating (MBR) codes [
35], provide different storage and bandwidth optimization strategies. However, these coding methods are usually not systematic codes, and compared with RS codes, they have higher overhead in reading the original data. Local Reconstruction Codes (LRCs) are an improvement on traditional RS codes, implementing a local grouping strategy [
10,
11,
12]. This strategy, by dividing data into multiple groups and using RS encoding to generate local parity blocks within groups while using RS encoding for global data to generate global parity blocks, reduces the amount of data read during recovery, thereby reducing data transmission traffic, especially when recovering local parity blocks. However, LRC has lower storage efficiency compared with traditional RS codes. For example, the coding scheme adopted by Meta (Facebook) changed from RS(10,4) to LRC(10,6,5), reducing the recovery bandwidth by 50%, but also decreasing storage efficiency [
36].
Moreover, in the optimization of erasure code computations, hardware acceleration schemes have shown significant potential. Plank and others [
18] improved the efficiency of encoding and decoding by optimizing finite field multiplication operations for RS codes using SIMD technology. Liu et al. [
19] developed a GPU-based erasure code library, GCRS, by utilizing the parallel processing and vector operation characteristics of GPUs, achieving a performance improvement of ten times compared with the Jerasure open-source library. These developments indicate that, in large-scale data processing, hardware acceleration schemes play a more crucial role compared with traditional software coding methods.
However, current hardware acceleration schemes are mostly concentrated on theoretical and laboratory environment research [
37,
38,
39], with relatively few application tests in real distributed storage system environments. This leads to specific problems and challenges that might arise in practical applications that have not been fully verified and resolved. For instance, existing hardware acceleration schemes might not have fully considered effective caching and splitting of data blocks when dealing with large data objects [
31,
40], which could lead to inefficiencies and stability issues in real distributed storage systems. Additionally, while hardware acceleration can enhance the computational efficiency of erasure codes, it might increase storage overhead or reduce storage efficiency in some cases. Therefore, when designing methods for data recovery in distributed storage systems based on FPGA and Cauchy RS codes, it is necessary not only to focus on the application of hardware acceleration technology but also to consider its applicability in real distributed environments and the overall system performance.