1. Introduction
Efficient distributed storage systems (DSSs) are considered to be crucial infrastructure for handling big data. These systems must be able to reliably store data over a long duration by introducing redundancy and storing data in a distributed manner across several storage nodes, which may be individually unreliable and could generate failures. Large data centers and peer-to-peer storage systems such as OceanStore [
1] from Berkeley and BigTable from Google [
2] are famous examples of distributed storage systems.
Owing to cost issues, large data centers also use many commercial hardware storage devices such as hard disk drives/solid state devices (HDDs/SSDs). As a result, device failure occurs regularly, rather than as an exception. The data are typically stored in a redundant manner to effectively protect valuable data against potential failures. The traditional storage method for large storage services such as cloud storage is triplication, i.e., triple replication of each symbol. For example, the Google file system [
3] and Hadoop [
4] adopt this approach. However, given that triplication requires thrice the storage space, a
Reed–Solomon code is deployed in their warehouse cluster in the case of Facebook [
5]. Although RS codes are efficient for handling specified numbers of erasures, all of the code symbols must be communicated and reconstructed to repair erasures. Thus, more efficient storage methods have been actively researched, including regeneration codes (RCs), fractional repetition codes (FRCs), and locally repairable codes (LRCs) [
6,
7,
8,
9,
10,
11,
12]. RC attempts to minimize the number of transmitted symbols, while the objective of LRC is to optimize the number of disk reads required to repair a single lost node. In some respects, LRC is essentially a block code with an additional parameter referred to as locality. There have been excellent reviews on the distributed storage codes (e.g., [
13,
14,
15,
16]). Moreover, a review article on this topic has recently been published [
17]. However, to the best of the authors knowledge, no review paper deals only with the binary LRC (BLRC) constructions, which are practically useful.
In most of the early suggestions for LRC constructions, the alphabet size of the stored symbols is very large. However, for efficient and convenient hardware implementation, the construction of codes over a small alphabet size for the stored symbols is of particular interest. For example, BLRCs are of special interest because multiplication is not necessary during the encoding, decoding, and repair processes.
This paper summarizes the recently proposed construction of BLRCs and their features. The code construction methods discussed in this paper are categorized as in
Figure 1. The construction methods of BLRCs are explained using cyclic code based, bipartite graph based, anticode based, partial spread based, and generalized Hamming code based approaches. In addition, the construction of BLRCs using modification methods for linear codes such as extending, shorting, expurgating, augmenting, and lengthening are discussed. This paper is organized into several sections. In
Section 2, the basic concepts used in the coding techniques for distributed storage systems are introduced. In addition, the characteristics of RC, LRC, and FRC are explained, including the meaning of locality and availability. In
Section 3, generation methods of LRCs are summarized with respect to individual types and features, with a focus on BLRC. Finally, the main conclusions are summarized in
Section 4.
3. Binary Locally Repairable Codes
When the LRCs are first introduced, there is no restriction on the field size. For the Singleton-like bound in [
31], there is an optimal construction matching for the bound of field size
, where the optimal LRCs are constructed using an algebraic structure. However, the coding complexity can be significantly reduced using BLRC.
Compared to
q-ary LRCs, BLRCs are known to be advantageous in terms of implementation in practical systems. In [
43], the advantages of
BLRC are discussed and compared with
non-binary LRC, (14,10) RS code, and three-replication with four metrics including encoding complexity, repair complexity, mean time to data loss, and storage capacity. The authors of [
43] further analyzed the advantages of BLRCs with a high Hamming distance and average locality [
44,
45]. In this section, we introduce bounds for BLRCs and various construction methods of BLRCs.
3.1. Bounds for the Binary Locally Repairable Codes
The bounds and constructions of BLRCs are quite different from those of q-ary LRCs. For the bound, the maximum code dimension of BLRCs is smaller than that of q-ary LRC and the corresponding optimal construction of the former should be made by different motivations such as easy implementation. Initially, we discuss the useful bounds for BLRCs.
Let us start with a general bound on LRC that shows a tradeoff relationship between rate
, minimum distance
d, and locality
r [
23]. For linear LRCs with information locality
r, there are tradeoffs among
n,
k,
d, and
r. Let
be an
LRC. Assuming that
and
, the rate is bounded as follows:
In addition, the minimum distance is bounded by [
31]
which is called a Singleton-like bound because it is a generalization of the classical Singleton bound for linear codes and we have the Singleton bound if
. It is well-known that a
q-ary
MDS code can achieve a Singleton bound. An optimal
LRC achieves the bound with equality. We can consider two extreme cases when
and
. For
, we have
and an
RS code is an
optimal LRC. For
, we have
and the duplication of an
RS code is an
optimal LRC. Therefore, we are interested in the case of
.
For the bounds of BLRCs, Cadambe–Mazumdar (C-M) [
33], linear programming [
46], and
-space bounds [
47,
48] are introduced. The first bound, considering the alphabet size, is given as
where
denotes the largest possible dimension of an
linear code over
. The C-M bound is often used to determine whether the given BLRC with short code length is optimal [
32]. However, because the exact value of
can only be obtained in a limited case with relatively short code length, it is difficult to apply the C-M bound to evaluate the optimality of general BLRCs.
In addition, a linear programming bound was proposed using the Delsarte linear programming method, which is known to be tighter than the C-M bound for BLRCs for some parameters [
49]. However, both bounds are expressed in the implicit forms and, thus, it is difficult to apply these bounds to BLRCs with long code lengths.
For an
linear LRC
,
-space bound was recently proposed using sphere packing [
47,
48]. The
-space is defined as the dual of the linear space generated by a minimum set of local parity checks of
with overall support covering all coordinates. For an
BLRC with disjoint repair groups, where
and
, the following bound holds for the parity of
[
50].
- (i)
- (ii)
If
is even, we have
These bounds are advantageous in two ways compared to the previous bounds. Firstly, the
-space bound is known to be tighter than the C-M bound for BLRCs with long code lengths. In addition, the inequality of the bound is expressed in an explicit form, i.e., the value of the bound is easily derived for BLRCs with long code lengths. Furthermore, the improved
-space bound is induced with the refined packing radius for BLRCs with
[
50].
A bound in an explicit form for
is given in [
48]. For an
linear BLRC with locality
r, such that
and
, it follows that
In the next subsection, we introduce the construction of BLRCs with various parameters and motivations, some of which are optimal or near-optimal with respect to the aforementioned bounds.
3.2. Classification of Binary Locally Repairable Codes
For the construction of BLRCs, various methods have been proposed based on the following:
- (i)
- (ii)
- (iii)
- (iv)
- (v)
- (vi)
generalized Hamming code [
47,
48]; and
- (vii)
modification of codes [
53,
59].
In the following subsections, the various types of constructions of BLRCs are summarized.
3.3. BLRCs from Cyclic Codes
Goparaju and Calderbank proposed several constructions of BLRCs from cyclic codes [
51]. Cyclic codes inherently enjoy efficient structures for encoder and decoder implementation. The
q-cyclotomic coset
is defined as
where
a is the smallest positive integer that satisfies
. The defining set of an
cyclic code
is defined as
where
has roots in the splitting field
,
. Using optimal cyclic codes in terms of the Singleton bound, three BLRC constructions are suggested as follows.
Construction (CC1) [51]: Let , be a factor of n and α be a primitive element of . Let be a cyclic code with the generator polynomial with the defining set as Then, is an LRC with locality r and dimension .
Construction (CC2) [51]: Let with even m, and locality . Let be a cyclic code in which the generator polynomial has the defining set Then, is an LRC of dimension and a distance .
Construction (CC2) is shown to be distance-optimal among the set of linear codes that have disjoint locality parity checks.
Construction (CC3) [51]: Let . Let α be a primitive element of . The generator polynomial with the defining setcan construct a BLRC that satisfies the following inequality for even k, , and . The BLRC construction from the binary Hamming code is expressed in the following construction.
Construction (CC4) [51]: For , we have when . Let be a cyclic code in which the generator polynomial has the defining set Then, is a three-available two-local LRC with dimension and minimum distance . The corresponding parity check polynomial is then given as Extending the results in [
51], Zeh and Yaakobi proposed several construction methods for BLRC in [
52]. These constructions generate BLRCs with locality 2. Construction (CC5) was based on binary reversible codes. Let
be the set given as
. Let
be the defining set of
single parity check code with one erasure correctional capability in a block of length
. Then, a BLRC can be obtained as in Construction (CC5).
Construction (CC5) [52]: For odd m, let and . Let be a single parity check code with , where the defining set is given as: The corresponding code is then an BLRC, where , , and .
In addition, Construction (CC4) was extended to obtain codes with a higher Hamming distance at the cost of a small reduction of the rate as follows:
Construction (CC6) [52]: Let and (i.e., ). Let be the defining set given as Then, the corresponding code is a BLRC with , , locality , and availability .
This construction was extended to the construction of simplex code with available and locality 2 as follows.
Construction (CC7) [52]: Let , which is divisible by (i.e., ). Let be a cyclic simplex code with the defining set given as The corresponding code is then a BLRC with , , , and dimension .
Another example of BLRCs was proposed by Tamo, Barg, Goparaju, and Calderbank in 2016 as in the following construction.
Construction (CC8) [54]: Let α be an nth root of unity and let z be an integer such that and . Then, is an binary cyclic code with the defining set D with the coset of the group . Then, the locality of is bound as . Moreover, each symbol of the codewords in has at least recovery sets of size .
A BLRC that can satisfy the explicit bound given in Equation (
4) is also proposed in [
60] as follows:
Construction (CC9) [60]: For , let and , where and . Let be a generator polynomial of the cyclic BLRC and be the uth root of unity. Then, BLRC can be constructed using the generator polynomials given by
- (i)
For , , where is the minimum polynomial of over .
- (ii)
For , , where m is a positive integer.
3.4. BLRCs from Random Vectors
A family of high-rate BLRCs with locality two and uneven availabilities was proposed in [
42], which requires intermediate procedures. The uneven availability is represented as an availability profile. For its construction, a
k-tuple binary column vector
with a nonzero element at the random position is required. Let
be a random function that converts
x into a binary vector with the same length by changing a zero element into a nonzero element. From
,
square matrices
for
are constructed individually by increasing
l as follows:
where
is generated from
by the lexicographical order of construction, and
is the
i circularly downward-cyclic-shifted vector of
. Then, a
matrix
for the parity part of the generator matrix in a systematic form is generated by concatenating the matrix
as follows:
Construction (RV) [42]: Let denote the generator matrix of the proposed BLRC in a systematic form. Then, a systematic generator matrix is constructed as It should be noted that the generator matrix has a code rate of .
An
BLRC code
from Construction (RV) has an all-symbol locality equal to
and the all-symbol availability profile is given by
where the numbers of
s, 2s, and 1s are
k,
, and
k, respectively, and each value denotes the availability for local repair of the
ith symbol of a codeword in
.
3.5. BLRCs from Bipartite Graph
In coding theory, a Tanner graph is a bipartite graph with two sets of vertices, a set of n variable nodes and a set of check nodes, for the constraint of error correcting codes. Suppose that n variable nodes are partitioned into groups. All variable nodes related to each group are linked to a unique check node called the local check node and the other nodes are called the global check nodes. Then, the constructed BLRC can achieve maximum locality r for all symbols.
Construction (BG) [44]: Let and , where ⊗ denotes the Kronecker product, denotes the all-one vector of length and is the parity check matrix of an Hamming code such as . Then, the parity check matrix of BLRC based on a bipartite graph of parameters is given as The minimum distance of the parity check matrix H in Construction (BG) is 4. This BLRC is optimal in some cases. Even when it is not optimal, it is shown that this code has a near-optimal code rate with a rate gap of .
In addition, an expander graph based construction of BLRC exists [
55,
56]. Suppose we have two sets
V and
C that satisfy the following conditions:
- –
, ;
- –
the degree of is t; and
- –
the degree of c is .
For , the bipartite graph is a -expander if for any subset , implies the size of the subset of C connected to is greater than . In addition, the length of the shortest cycle of the graph G is greater than 4. As such, we can have the following construction:
Construction (EG) [55,56]: Let be an parity check matrix where and , whose columns correspond to the vertices of V and the rows corresponds to the vertices of C. Then, is equal to one if the corresponding vertices and are connected with an edge. For , the code constructed from is an BLRC.
In Construction (EG),
is chosen from the range
and
is determined as a solution of the following equation:
where
. The probability that
G is a
expander is greater than
for
. In addition, the code rate is bounded by
where the equality holds for the case whereby
is a full rank matrix.
3.6. BLRC from Anticode
An anticode
of length
n is a code that may contain repeated codewords in
and has an upper bound on the distance between codewords [
61]. Contrary to the minimum distance in generic error correcting codes, the maximum distance
is defined as the maximum Hamming distance between any pair of codewords in
. This anticode is a core ingredient of the following BLRC.
The generator matrix of the anticode is a matrix, and all codewords in can be expressed by a linear combination of k rows of . If the rank of is , then each codeword in occurs times. Let be an anticode of length and Hamming weight of 2 and the columns of its generator matrix are all weight-2 vectors of length s.
Construction (AC1) [57]: Let be a binary simplex code of length , dimension m, and minimum Hamming distance . Let be the generator matrix of , and let its columns consist of all possible nonzero vectors in . We prepend zeros to every column of of to construct an matrix . By deleting the columns in from , we can construct a generator matrix G of BLRC, , with parameters and locality 2.
For
, the code
satisfies the C-M bound in Equation (
1). Moreover, three instances with locality
of Construction (AC1) are listed in [
57]:
- –
The code from the anticode is a LRC.
- –
The code from the anticode is a LRC.
- –
The code from the anticode is a LRC.
Construction (AC2) [57]: Let , , be an anticode such that its generator matrix consists of all columns of weight in . Then, zeros are prepended to every column of to form an matrix whose columns will be deleted from to obtain a generator matrix G for the code , which becomes a LRC with locality .
This code achieves the Griemer bound [
62].
Construction (AC3) [57]: Let be an anticode with generator matrix given bywhere is the generator matrix of the simplex code . Let be a code obtained based on the Farrell construction using the simplex code and the anticode . Then, is a BLRC with locality . It is also shown that this code can satisfy the bound in Equation (
1).
3.7. BLRCs from Partial Spread
To introduce BLRCs constructed from partial spread, the definition of partial t-spread is given.
Definition [50]: A partial t-spread of is a collection of t-dimensional subspaces of such that for . Moreover, S is maximal if it has the largest possible size. In particular, if , then S is a t-spread. If , a t-spread of exists.
Now, we can define a BLRC
with parity check matrix given by
Then, a BLRC of parameters can be constructed in the following way:
Construction (PS1) [50]: Let be the all-one vector of length n. Let and be a matrix that has binary expansions of the vectors as its columns, where and are distinct elements of the finite field . Then, the parity check matrix of a BLRC is given as in Equation (5). For the further extension of Construction (PS1), the parity check matrix can be given as
where
. For
,
is an
matrix, whose
ith row is the all-one vector of length
and the other rows are all-zero vectors. Moreover,
is the
ith
submatrix of
. It is well-known that if any
columns of the parity check matrix
H are linearly independent, the minimum distance of a linear code is greater than or equal to
d. Furthermore, for a collection of any
columns
of
, if
, then
, where
satisfy the following two conditions:
- (i)
For , is even, where ; and
- (ii)
.
Then, we can construct two k-optimal BLRCs with disjoint repair groups as in the following construction.
Construction (PS2) [50]: Let and be the maximum partial -spread of . In addition, let be a basis of . For , there exists a binary linear code with the parity check matrix . Let be the set of indices corresponding to nonzero coordinates of a vector x. For , let be the set , where and is the ith column of . When , . Let be an matrix whose columns consist of the vectors in . Then, we can define a BLRC with a parity check matrix H as in Equation (6), where . A set is -wise weakly independent over if no set , where , has the sum of its elements equal to zero. Then, we have , if the columns of satisfy the following conditions:
- (i)
for ;
- (ii)
for ; and
- (iii)
for .
Construction (PS3) [50]: Let , and be a maximum partial -spread of and the basis of is . When , there is a binary linear code. Let be the same set in Construction (PS2) for . For , is defined as . Let be an matrix whose columns consist of the vectors in . Then, a BLRC can be constructed using a parity check matrix H in Equation (6) for . Let be the maximal cardinality of subspace codes over with minimum distance d and dimension k. Then, we can construct a BLRC as follows:
Construction (PS4) [50]: Let such that for . Then, there exists an BLRC with dimension given aswhere it is optimal with respect to the bound in Equation (2). The following construction is nearly optimal with respect to the bound in Equation (2). Construction (PS5) [50]: Let be a maximum partial two-spread of . The basis of is given as . Then, a BLRC with parity check matrix H of the form in Equation (6) for can be constructed using the submatrices for , which is given as Another construction based on the partial
t-spread is also proposed in [
58]. Let
q be a prime power and
be the vector space of dimension
m over
.
Construction (PS6) [50]: Given an integer , determine the smallest integer t such that . An integer m such that can be chosen, and there exists a partial t-spread with a size of at least l of . Let be a basis of and be a set whose elements are defined as for and . Finally, let for . Let s be an integer such that , and we use any vectors in to fill each submatrix as its columns for . Then, the BLRC has length , dimension , minimum distance , and locality r.
Then, the BLRCs and obtained from Construction (PS6) are optimal. In addition, for , the BLRCs from Construction (PS6) are almost optimal in terms of the C-M bound and for , the BLRCs from Construction (PS6) are almost optimal with respect to the C-M bound.
3.8. BLRCs from Generalized Hamming Code
Suppose that s and t are two positive integers such that and . Let A be a binary parity check matrix such that any four columns of this matrix are linearly independent. For , A can be chosen as the identity matrix. For , A is the parity check matrix of a binary linear code that can be built from non-primitive cyclic codes with length . Let be the primitive root of , and let denote the minimum polynomial of . The degree of is . is a parity check matrix defining the binary cyclic code with parameters that is generated by . Then, the set forms a subset of the roots of . By deleting one coordinate of , we can construct the parity check matrix A of the punctured code with parameters . In addition, B is defined as a matrix such that the columns are all nonzero -tuples from , with the first nonzero element equal to 1. Then, B is an parity check matrix of a -ary Hamming code. Using the matrices A and B, a BLRC construction is provided as follows.
Construction (GH1) [47,48]: Suppose that are the elements corresponding to the columns of A, and the ith column of B is denoted by a vector for . Let be a binary linear code with the parity check matrix given aswhere and for , is an matrix whose ith row is an all-one vector, the other rows are all-zero vectors, and is an matrix over whose columns are binary expansions of the vectors . It is shown that this construction can satisfy the bound given in Equation (
4).
The shortening for LRCs can also give us another LRC. Let be an BLRC with locality r such that and . Then, an BLRC with locality r can be obtained by shortening C, where the parameters of satisfy , , and .
Construction (GH2) [48]: By applying the shortening of the times to C, we have an BLRC.
This kind of code modification approach can be extended to the well-known code modification methods such as extending, shorting, expurgating, augmenting, and lengthening [
53], as in the following subsection.
3.9. BLRCs from Code Modification
It is well-known that there are various code modification methods for linear codes. For BLRC, we can also use these modification methods to generate codes with new parameters [
53]. Let
be an
binary code with locality
r and let
be the minimum distance of its dual code,
. By adding a parity bit to each codewords in a
with parameters
, the
extended code
with parameters
can be obtained. This can be formally presented as
where
for odd
d and
for even
d [
53]. For BLRCs, we are interested in the locality of the derived codes for a give
with locality
r. Let
be the dual code of
. If the maximum Hamming weight among codewords in the code
is
, then the locality of the extended code
is
. If the maximum Hamming weight among codewords in
is
, then the locality of the extended code
is
. Finally, if
is an
cyclic code with an odd minimum distance
d, then the locality of the dual code
in the extended code of
is
[
53].
The shortening can also be applied to the derivation of new BLRC. By deleting codewords in
with nonzero values in the last coordinates and removing the last coordinates from the remaining codewords, we can find the
shortened code
of
. This can be formally represented as
For an original
binary linear code, it is known that the parameters of the shortened code are given as
. Moreover, if the original code is BLRC with locality
, then the locality of the shortened code
is
r or
. Let
be the dual of
and let
be the minimum distance of the dual code
. Then, for an
cyclic code
, the locality of code
is either
or
[
53].
Next, the expurgation also can be used to generate new BLRC for an
BLRC
with odd weight codewords. As such, the
expurgated code
of
can be generated as a subcode of
by selecting only even weight codewords such that
The corresponding parameters of
are given as
, where
is the minimum Hamming weight of the nonzero codewords in
. Let
be the dual code of
. Then, we have
[
53].
As an inverse method of the expurgation as previously described, the
augmented code
of an
code
without the all-one codeword
is defined as the code
whose parameters are given as
, where
is the maximum Hamming weight of codewords in
. If the code
is cyclic, then the expurgated and augmented codes of
are also cyclic [
53].
Another example of BLRC from the code modification methods is presented in [
60] using the shortened expurgated Hamming code.
Construction (SE-Hamming) [60]: Let β be a primitive element of and n be a positive integer and divisible by 3 such that . Let is a expurgated Hamming code with the generator polynomial , where is the minimal polynomial of β over . Then, a shortened expurgated Hamming code can be generated by shortening the first information bits of . The concatenation of and an cyclic code with parity check polynomial as an inner code then yields an LRC .
3.10. Summary of BLRC Constructions
We summarize the discussed BLRC construction methods in
Table 1. Generally, in
Table 1, X denotes the case that the equality of the bound is not achieved for all parameters. For the case of C-M bound,
is assumed to satisfy the Singleton bound for given
n and
d.
4. Conclusions
This paper summarizes the recently proposed constructions for BLRCs and their features. To achieve efficient hardware implementation, the codes are constructed over the binary field because the need for multiplications is obviated during the encoding, decoding, and repair processes. We explain the various construction methods of BLRCs using cyclic code based, random vector based, bipartite or expander graph based, anticode based, partial spread based, and generalized Hamming code based approaches. In addition, construction methods of the BLRCs using code modification methods for linear codes such as extending, shorting, expurgating, and augmenting are introduced.
We selectively review important achievements on BLRCs from the authors’ perspectives and thus obviously the authors’ bias are reflected. Therefore, not being reviewed here does not mean it is not an important result. Especially, we also apologize in advance for the lack of proper citation or lack of new research results because this area is actively researched and many papers have been introduced in a relatively short period of time.