1. Introduction
Despite significant efforts, both in industry and science, high fault tolerance remains a serious problem in the management of large-scale IT systems.
The consequences of failures, regardless of their causes, can be eliminated using various methods by introducing data redundancy. Erasure codes, replication, and Resilient Distributed Dataset (RDD) [
1] are the most important methods for ensuring fault tolerance in modern data storage and processing systems.
Replication is the simplest-to-implement (therefore, most widespread) method of introducing redundancy [
2]. However, replication leads to significant overhead costs expressed in additional resources for duplicating data and functionality. In turn, erasure codes [
3,
4] are devoid of this drawback. Such codes counteract the partial loss of data and are based on a principle of antinoise coding (parity check). Erasure codes save significant costs and energy [
3]. However, data recovery after failures is associated with high reconstruction costs and network traffic. The transition from replication to erasure codes and RDD is complicated by the need to update the hardware–software base and recode the entire volume of stored data. Nevertheless, there are examples of successful adoption of these technologies by Facebook [
5] and Microsoft Azure [
3].
This paper proposes a new approach to error correction codes based on a redundant residue number system (RRNS) [
6,
7]. Along with the Reed–Solomon codes [
8], the Rabin dispersal algorithm [
9], etc., RRNSs provide the functionality implemented by erasure codes and the functionality implemented by RDD algorithms. In addition, an important property of RNSs is the ability to perform arithmetic operations over encoded data. This property can be used in modern distributed data processing systems [
10,
11,
12] and encrypted data processing schemes [
13]. Therefore, RRNSs are a unique and versatile tool for a wide range of applications.
There are two main strategies for correcting errors in RRNSs. The first one involves syndrome decoding [
14,
15,
16]: erroneous parts of the data are determined by comparing a special numerical syndrome calculated on the obtained data with the reference values. The second strategy is to restore error-free data based on the obtained code [
17,
18,
19] by calculating the error magnitude and correcting the code during the decoding process. The authors [
19] suggested a maximum likelihood decoding (MLD) approach to simplify the corrective decoding method by reducing the enumeration of values during decoding.
This paper proposes a modified method for corrective data decoding in RRNSs using MLD. Unlike the decoding method [
19], this approach involves an alternative version of the Chinese Remainder Theorem (CRT) [
20,
21,
22] for more efficient data restoration due to parallel execution of decoding and error correction. The CRT with fractions allows dividing the decoding process into two independent parts: determining the weighted characteristic of a number (a relative estimate of its magnitude) and decoding itself. The weighted characteristic can be directly used for error correction, and data decoding can be performed in parallel. The proposed modified modular projection method with MLD speeds up the procedure of error correction and number restoration in the weighted number system compared to the original method [
19]. At the same time, it consumes many more hardware resources.
This paper is organized as follows: In
Section 2, the existing research in the field of error correction in RRNSs is surveyed. A detailed introduction to RNSs, including the notations and preliminaries used below, is given in
Section 3. Additionally, this section briefly describes MLD [
19].
Section 4 presents the proposed method and analyzes its complexity.
Section 5 is devoted to the hardware implementation of the proposed method, comparing it with the existing methods. The outcomes of this paper are summarized in
Section 6.
2. Related Works
Error detection and correction codes based on a redundant residue number system (RRNS) are modern and powerful tools to control and correct arithmetic processing and data transmission errors. Algorithms that correct only single errors were considered and improved by several authors [
6,
23,
24,
25]. Decoding the magnitude and location of a single error is significantly less complex than detecting and correcting errors in multiple bits of RRNSs. It requires verifying a huge number of different possible combinations of erroneous residual digit positions in the error localization stage. The localization stage consumes the most time in the procedure for correcting multiple errors in RRNSs. All known methods for detecting and correcting multiple errors in RRNSs can be divided into three large groups: continuous fractions [
26,
27], syndrome decoding [
14,
15,
16], and modular projections [
17,
18,
19]. These groups of methods have advantages and drawbacks.
The methods with continuous fractions are based on the Euclidean recursive algorithm. After localization, erroneous remainders are corrected by expanding the moduli system using error-free remainders. For multiple errors in residual digits of large bit depth, the recursive Euclidean algorithm needs many iterations and becomes inefficient in hardware implementation.
The paper [
16] presented an approach to correcting multiple errors in RRNSs using syndromes. Later, it was improved in [
14,
15]. According to [
15], the residual digits in syndrome decoding are divided into three groups; seven categories of error positions are determined among them. The error magnitudes are distributed in three syndromes, which are used to categorize the errors accurately and localize the erroneous residual digits simultaneously from the six lookup tables. Syndromes can be calculated in parallel, and operations are performed in small moduli. This feature is an undoubted advantage of the approach, providing high-speed execution of the error correction procedure for RRNSs. Among the drawbacks of this method, note a rather strong constraint on the choice of redundant moduli:
where
denotes the number of information moduli in an RRNS, and
is the total number of RRNS moduli. A critical shortcoming of methods with syndrome decoding is that the volume of lookup tables sharply grows with an increase in the error multiplicity and the bit width of the residual digits. Searching through large lookup tables is time consuming and negates the benefit of efficiently calculated syndromes. Thus, methods with syndrome decoding are efficient only for small moduli sets.
The method with modular projections was pioneered in [
17] and improved and extended to the case of multiple errors in the subsequent works [
18,
28]. With this approach, erroneous residual digits are localized by reducing the RRNS, deleting the number of remainders equal to the error multiplicity. The value represented in the RRNS falls into the legitimate range after excluding the erroneous residual digits. This process is similar to syndrome decoding. It requires estimating each possible combination of residual digit error locations. Methods with modular projections need no large lookup tables: this is an absolute advantage over methods with syndrome decoding. Other advantages include a weaker constraint on the choice of redundant moduli than in the case of syndrome decoding:
where
denotes the number of information moduli in an RRNS, and
is the total number of RRNS moduli. A critical shortcoming of such methods is the number of modular projections equal to
, where
denotes the maximum error multiplicity. Obviously, the number of modular projections grows fast with an increase in the error multiplicity, raising the number of computations accordingly.
The paper [
19] proposed an essentially different method with modular projections: the number of projections was reduced by changing their construction procedure. The idea is to delete
remainders instead of
ones (the conventional approach). The algorithm for constructing modular projections and their number was refined in [
29]. Since
, the number of modular projections is significantly reduced, but it becomes necessary to check the Hamming distances according to maximum likelihood decoding (MLD). This approach eliminates the critical shortcoming of the rapidly growing number of projections and seems an optimal choice for correcting multiple errors in modular codes. The ideas and approaches [
19] underlie the modified modular projection method with MLD; see
Section 4. Using the Chinese Remainder Theorem with fractions [
20,
21,
22] opens up new opportunities for parallelizing the algorithm and speeding up the procedure for correcting RRNS errors.
Section 5 presents the hardware implementations of the original method [
19] and the modified modular projection method with MLD on Field-Programmable Gate Array (FPGA). The hardware implementations partially involve the authors’ computing modules described in the earlier publications [
30,
31].
3. Residue Number System and Multiple Error Correction
3.1. Residue Number System
A residue number system (RNS) [
7,
32,
33] is a non-weighted number system. In an RNS, an integer
is represented by a
-dimensional vector
of the remainders in dividing it by positive coprimes
called the RNS moduli. The magnitude of
belongs to the range
, where
and each remainder
is the least nonnegative modulo
residue of
:
Note that when obtaining the RNS representation of
, the modulus remainders are calculated independently and in parallel. The paper [
30] considered the most effective methods for calculating the remainder of the division and presented schematic diagrams for their hardware implementation.
The inverse conversion, i.e., restoring the weighted representation of a number from its RNS representation, is a much more difficult task than the direct conversion. The weighted representation of a number (its weighted characteristic) can be found using the Chinese Remainder Theorem (CRT) [
7], the CRT with fractions (CRTf) [
20,
21,
22], or conversion to a mixed radix number system (MRNS) [
34,
35].
The modified multiple error correction method proposed below partially involves the CRT and CRTf.
3.2. Chinese Remainder Theorem
According to the Chinese Remainder Theorem (CRT), an integer
is calculated from its RNS representation
as follows [
7]:
or equivalently,
where
denotes the multiplicative inversion of
, and
is the number rank (a value showing how many times the sum in (3) exceeds the RNS range).
3.3. Chinese Remainder Theorem with Fractions
A number
is converted to a weighted number system (WNS) using the Chinese Remainder Theorem with fractions (CRTf) [
4,
24,
31]:
where
and
denotes taking the fractional part of a number.
The value is called the approximate weighted characteristic of in the RNS representation.
We propose using a binary shift of fractions to implement this algorithm on a hardware base not supporting fractions (e.g., FPGA). The accuracy of the constants
in (5), sufficient for correctly reconstructing the weighted representation of
, was thoroughly estimated in [
20]. A number
X is converted to a WNS using the CRTf with the binary shift as follows:
where
and
[
4].
The CRTf allows converting RNS representations of numbers to the WNS ones efficiently on any computing platform: all calculations are reduced to operations well implemented in hardware. These operations include addition, multiplication, binary shifts, and discarding the most significant digits of a number [
31].
3.4. Redundant Residue Number System
The residual representation of an error-free integer is unique for the range . This range is used for detecting and correcting errors in the RNS. Expanding the code space of the residual representation with additional residual bits, we form an -redundant residue number system (RRNS) with correcting properties.
An RRNS with is based on coprime moduli and the full range . The additional moduli, , are called the redundant moduli of the RRNS. The range corresponding to the first k information moduli from the n-dimensional vector is called the legitimate range. The range corresponding to the additional redundant (control) moduli is called the illegitimate range. The legitimate range is the required computational range of the number system, while the illegitimate range is necessary for error detection, localization, and correction.
When an error occurs, an integer within the legitimate range is converted to , where E is the error. The value falls into the illegitimate range if
- 1)
, ,
- 2)
the number of erroneous remainders does not exceed .
Thus, to detect an error, we should restore the number X from its RRNS representation . If the resulting value is smaller than the dynamic range of the RRNS (M), the number contains no errors of multiplicity and below; otherwise, the RRNS representation of X contains 1 or 2 or 3,..., or erroneous residual digits.
The number of erroneous residual digits detected and corrected is determined by the number of redundant moduli added. In the general case without special restrictions, the
-RRNS with
redundant moduli detects
and corrects
erroneous residual digits [
7].
3.5. Modular Projection Method
The modular projection method is used to correct erroneous residual digits of numbers in the RRNS representation. Let an error be detected in a number
written in the RRNS. The
-RRNS allows correcting
erroneous residual digits [
7]. According to the classical modular projection method,
or fewer errors are corrected in the following steps:
- 1)
Constructing modular projections. A modular projection is obtained by deleting remainders in the original RRNS representation of the number. Deleting different sets of remainders, we construct different modular projections.
- 2)
Obtaining the weighted representation (weighted characteristic) of each modular projection. The weighted representation of a number is its magnitude in the WNS. The weighted characteristic of a number depends on the inverse RNS-to-WNS conversion method.
- 3)
Comparing the weighted representation (weighted characteristic) of each modular projection with the value (weighted characteristic) of the dynamic range. A modular projection whose weighted representation (weighted characteristic) does not exceed that of the dynamic range contains the correct remainders. Note that there can be correct modular projections, where is the number of erroneous digits, . In this case, the remainders deleted when constructing each correct projection will be erroneous.
- 4)
Calculating the remainders on dividing the correct modular projection’s weighted representation by the moduli corresponding to the erroneous remainders. Note that correcting erroneous remainders is not required to decode correct data from the RRNS: the weighted representation of a correct modular projection is the corrected number written in the WNS. When using the weighted characteristic of a modular projection in step 3), it is necessary to perform an additional step for decoding: obtain the weighted representation of the number from its weighted characteristic.
For each projection, calculations are performed in parallel, which significantly reduces the total time to correct errors in RRNS codes. A critical drawback of this method is the fast-growing number of modular projections with an increase in the error multiplicity. An increase in the number of modular projections does not affect the total execution time of the error-correction procedure in RRNS codes due to the parallel computations for each projection. However, it sharply raises hardware costs [
18]. This critical drawback can be overcome using the concept of maximum likelihood decoding (MLD) [
19].
3.6. Modular Projection Method with MLD
The paper [
19] improved the classical modular projection method to reduce the number of projections by changing their construction. When constructing a modular projection, the idea is to delete not
residual bits of the
-RRNS instead of
ones, where
is the maximum multiplicity of an error corrected by the
-RRNS, and
is the number of redundant moduli. The algorithm for constructing modular projections was described in detail in [
29].
Such an approach to constructing modular projections allows for reducing their number to
[
29]. However, some incorrect projections fall into the legitimate RRNS range. To separate correct projections, we propose using maximum likelihood decoding (MLD) [
19]. According to the MLD concept, an additional step is to calculate the Hamming distances between the erroneous number in the RRNS representation and each modular projection. In the RNS, the Hamming distance is defined as the number of distinct corresponding remainders. Let
and
be numbers written in the RNS with bases
. Then the Hamming distance is given by
Therefore, we should find the remainders of the moduli deleted when constructing each projection.
Two conditions determine a correct projection:
- -
The modular projection in the WNS (its weighted characteristic ) is smaller than the RRNS dynamic range (its weighted characteristic ).
- -
The Hamming distance between the distorted number X′ and the modular projection in the RRNS does not exceed the value of the maximum multiplicity of errors corrected by this RRNS.
The following formula combines these conditions:
The remainders of the moduli deleted when constructing each projection can be calculated by extending the RNS moduli system [
32] or restoring the number in the WNS and calculating the remainders of these moduli. These operations are resource-consuming and significantly increase the total time for correcting errors and restoring the number in the WNS.
4. Modified Modular Projection Method with MLD
Now consider a new method combining the advantages of the CRT, CRTf, and maximum likelihood projections [
19]. The proposed modification concerns error localization and restoration of a correct number in the WNS; error detection is identical for the original method [
19] and the modified method proposed below. In this regard, the description of error detection is omitted.
According to [
19], each projection is restored in the WNS using the CRT. The resulting projections are employed to calculate the values of the missing remainders. Next, the Hamming distances are calculated for each projection, and the correct projection is selected (
Figure 1a).
Reconstructing the weighted representation of projections using the CRT has high time delays. First of all, the delay is due to executing operations on a large modulus comparable to the dynamic range of the RRNS.
In the original method [
19], the reconstructed weighted representations of modular projections are used, on the one hand, to calculate the missing remainders subsequently and, on the other hand, to obtain the correct number in the WNS. (One of the projections will be the correct number represented in the WNS.) In the modified method, we propose using (a) the ranks of the modular projections to obtain the missing remainders and (b) the approximate weighted characteristics of the CRTf to restore the modular projections in the WNS. This approach allows for calculating the missing remainders and reconstructing projections in the WNS in parallel (
Figure 1b), not sequentially, as was proposed in [
19].
The efficiency of the proposed modification largely depends on the efficiency of calculating the rank of a number represented in the RRNS. The choice of the CRTf for reconstructing modular projections in the WNS is not accidental: the approximate weighted characteristic closely relates to the rank of the number written in the RRNS.
According to (3), the rank
is given by
Recall that
denotes taking the fractional part of a number. Hence, the rank of
is given by
Note that
and
are the integer and fractional parts of the same value. We introduce the notation
Calculating , we simultaneously find the weighted characteristic (Equation (5)) and the rank of (Equation (10)).
Similar to the case of the CRTf (Equations (6) and (7)), we propose using a binary shift to implement the calculation of
on a hardware base not supporting fractions. The accuracy of the constants
in (11) is estimated by analogy with [
20]. Then
where
[
20].
The value
is especially convenient for the hardware implementation of the error correction procedure for modular codes. Really, the
least significant bits of the binary representation of
calculated by Equation (12) will be the shifted weighted characteristic
(Equation (7)):
where
[
20].
The other (most significant) bits will be equal to the rank
of
:
where
[
20].
After obtaining the shifted approximate weighted characteristic by Equation (13), the standard CRTf Equation (6) is used for restoration. We discuss in detail the calculation of the missing remainders using the rank .
This operation extends the system of RNS moduli [
7] and can be performed efficiently by efficiently calculating the rank of a number.
Introducing the compact notation
, we write
Due to (15), for each modular projection, the missing remainders on dividing the number
by the moduli
deleted when constructing the projection are given by
Performing trivial transformations, we obtain
and, finally,
Note that for each modular projection, the values and differ. However, these values are completely determined by the set of RRNS moduli. Therefore, they are calculated in advance and stored in the memory of the computing device.
Thus, the missing remainders are calculated using multiplication by a constant and addition on a small modulus comparable to the RRNS moduli. Both operations are well implemented in hardware. The key feature of the proposed approach is the ability to calculate the first term
in the sum (16) independently and in parallel with the second one
. We denote them by
and
, respectively:
Then Equation (16) becomes
Since the values are calculated in parallel to and , the total execution time of the procedure for correcting errors in modular codes and restoring the correct number in the WNS is reduced.
An example demonstrating an approach to extending the system of RNS moduli using the number rank is presented in
Appendix A (See, Example A1).
Another example demonstrating the proposed approach to error correction and restoration of the correct number in the WNS is presented in
Appendix B (See, Example A2).
The proposed modified modular projection method with MLD for correcting multiple errors in modular codes allows for achieving a high level of parallelism and significantly reducing the total execution time of the multiple error correction procedure.
The proposed method requires calculation of the rank (Equation (14)) for base extension. CRT-I and CRT-II are not directly applicable to do this, while function , used in the article, gives the advantage to calculate the relative size of both the value and simultaneously.
5. Hardware Implementation of Modified Modular Projection Method with MLD
The original [
19] and modified modular projection methods for correcting multiple errors in modular codes (
Section 4) have the identical error detection stage. For both methods equally, the implementation variability of this stage lies in choosing an appropriate algorithm for calculating the weighted characteristic of the corrected number. The hardware implementation and comparative analysis of such algorithms were presented in [
31]. Let us compare the efficiency of error localization and correct data restoration in a WNS.
Using various computing devices for implementing methods and algorithms is often dictated by the need to adapt them to the architectural features of a given hardware base. Support of calculations in the ring of integers is a characteristic feature of FPGA computing devices that increases speed but severely restricts the class of executed tasks.
The original method [
19] can be implemented on FPGA without any restrictions. The modified method contains operations with fractions and, therefore, requires adaptation when implemented on FPGA. Equations (12)–(14) are used for passing to integer calculations. This transition is illustrated by an example in
Appendix C (See, Example A3).
The result of this example coincides with that of an example in
Appendix B. Therefore, the transition to integer calculations by Equations (12)–(14) is correct.
The efficiency of the original [
19] and modified methods for correcting multiple errors in modular codes was compared by their implementation in the Very High SpeedIntegrated Circuits Hardware Description Language (VHDL) using Xilinx Vivado 2019.1, a software tool for the design and analysis of HDL structures. For testing, we used a Kintex-7 FPGA (core Xilinx xc7k70tfbg676-2) of a sufficient area without DSP blocks.
This software–hardware base has hardware parallelization at the level of blocks implementing individual procedures and functions and, moreover, at the level of operations within these blocks.
Figure 2 shows the schemes for parallelizing the compared methods at the block level: vertical lines separate parallel executed blocks, whereas horizontal lines indicate the synchronization of upper blocks. The schemes in
Figure 2 specify the corresponding schemes in
Figure 1, considering their hardware implementation.
The original method [
19] contains no parallel blocks (
Figure 2a); however, calculations for each modular projection are performed in parallel within each block, except for the first and the last ones. In the modified method, calculations for each modular projection are also performed in parallel within each block, except for the first and last ones; in addition, there is parallelism at the block level (
Figure 2b).
Let us consider the hardware implementation of each block in
Figure 2.
Original modular projection method with MLD [
19] is presented in
Appendix D.
Modified Modular Projection Method with MLD
Block . When an error is detected, this block receives at the input remainders representing the distorted number in the -RRNS.
Block
. This block constructs modular projections. It is implemented in
parallel computational threads, where
. Each thread corresponds to a modular projection and receives at the input
remainders of the distorted number
(the outputs of the block
) obtained by the modular projection algorithm [
29].
Block
. This block calculates the extended weighted characteristics of modular projections
in the WNS. According to (12), it is implemented by
parallel multiplications of the remainders (the outputs of the block
) by constants modulo
and a
-operand adder [
31].
Block . This block calculates the ranks for each modular projection . It is implemented by extracting the most significant bits with numbers not less than (14) from the extended weighted characteristic (the output of the block ).
Block . This block calculates weighted characteristics for each modular projection . It is implemented by extracting the least significant bits with numbers less than (13) from the extended weighted characteristic (the output of the block ).
Block
. This block calculates the missing remainders (the first term) for each modular projection
. It is implemented by
parallel multiplications of the remainders (the outputs of the block
) by constants modulo
and a
-operand modulo adder [
31]; see Equation (17).
Block
. This block calculates the missing remainders (the second term) for each modular projection
. It is implemented by multiplying the rank (the output of the block
) by a constant modulo
[
31]; see Equation (18).
Block
. This block calculates the missing remainders for each modular projection
. It is implemented by modulo adding the first term (the output of the block
) to the second term (the output of the block
) [
31]; see Equation (19).
Block (6). This block calculates the modular projections in the WNS. It is implemented by multiplying the weighted characteristic of the modular projection (the output of the block ) by the dynamic range . Subsequently, the least significant bits with numbers less than are discarded; see Equation (6).
Block . This block calculates Hamming distances between the distorted number and each modular projection . It is implemented by parallel comparisons of the corresponding remainders of the distorted number (the outputs of the block ) and the modular projection (the outputs of the block ). Subsequently, the mismatches are counted by an -operand adder.
Block X (9). This block chooses a correct modular projection. It is implemented by the conjunction of two comparisons: the modular projection in the WNS (the output of the block CRT: ) is smaller than the dynamic range , and the Hamming distance (the output of the block ) is not greater than the maximum multiplicity of errors corrected by the -RRNS. The results for each thread (conjunctions for different modular projections) are glued together into a -bit number, and the number of the first nonzero bit corresponds to the correct modular projection.
Block X. This block outputs the correct number in the WNS. It is implemented by a -input multiplexer with one control input. The common inputs are the modular projections in the WNS (the outputs of the block CRT: ). The control input is the number of the correct modular projection (the output of the block (9)).
In the course of comparative analysis, we obtained data on the time and hardware costs for correcting single and double errors in modular codes. We used the
-RRNS (single error) and
-RRNS (double error) for numbers of different bit widths (4 bits, 8 bits, 16 bits, 24 bits, 32 bits, 48 bits, and 64 bits). The moduli sets, the time and hardware costs for correcting single and double errors and restoring the number in the WNS using the original modular projection method with MLD [
19], and the modified one proposed in this paper are presented in
Appendix E (See,
Table A7 and
Table A8).
Figure 3 shows the operating times of the methods depending on the bit width of the numbers under correction for a single error (see,
Figure 3a) and double errors (see,
Figure 3b).
According to
Figure 3, the modified modular projection method with MLD proposed in this dynamic is faster than the original method [
19]. Based on the simulation results on FPGA (see
Appendix D), we conclude that the proposed method speeds up the procedure of error correction and number restoration in the WNS by an average of 1.23 times (18%) compared to the original approach [
19]. On average, the modified modular projection method with MLD requires 1.55 times (35%) more resources for single errors and 1.76 times (43%) more resources for double errors.