On the Single-Parity Locally Repairable Codes with Multiple Repairable Groups

Lu, Yanbo; Liu, Xinji; Xia, Shutao

doi:10.3390/info9110265

Open AccessArticle

On the Single-Parity Locally Repairable Codes with Multiple Repairable Groups

by

Yanbo Lu

^1,2,

Xinji Liu

^1,2 and

Shutao Xia

^1,2,*

¹

Graduate School at Shenzhen, Tsinghua University, Shenzhen 518055, China

²

Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China

^*

Author to whom correspondence should be addressed.

Information 2018, 9(11), 265; https://doi.org/10.3390/info9110265

Submission received: 25 September 2018 / Revised: 17 October 2018 / Accepted: 20 October 2018 / Published: 24 October 2018

(This article belongs to the Section Information Theory and Methodology)

Download

Browse Figure

Versions Notes

Abstract

:

Locally repairable codes (LRCs) are a new family of erasure codes used in distributed storage systems which have attracted a great deal of interest in recent years. For an

[n, k, d]

linear code, if a code symbol can be repaired by t disjoint groups of other code symbols, where each group contains at most r code symbols, it is said to have availability-

(r, t)

. Single-parity LRCs are LRCs with a constraint that each repairable group contains exactly one parity symbol. For an

[n, k, d]

single-parity LRC with availability-

(r, t)

for the information symbols (single-parity LRCs), the minimum distance satisfies

d \leq n - k - ⌈ k t / r ⌉ + t + 1

. In this paper, we focus on the study of single-parity LRCs with availability-

(r, t)

for information symbols. Based on the standard form of generator matrices, we present a novel characterization of single-parity LRCs with availability

t \geq 1

. Then, a simple and straightforward proof for the Singleton-type bound is given based on the new characterization. Some necessary conditions for optimal single-parity LRCs with availability

t \geq 1

are obtained, which might provide some guidelines for optimal coding constructions.

Keywords:

Erasure codes; locally repairable codes; locality; availability; data reliability

1. Introduction

In the era of big data, more and more modern large-scale distributed storage systems (DSSs) tend to use erasure codes as the redundancy scheme rather than simple replication redundancy schemes [1,2]. Some storage systems like Hadoop Distributed File System (HDFS) [3] firstly divide a data file into many data blocks (chunks) with the same storage size (typically 128 MB in HDFS) and then store these data blocks in many storage nodes. The storage cost of these replication schemes is equal to the number of replicas (replication ratio) of files stored in DSSs. When a storage node fails, the storage system uses its available replica to repair (regenerate) the node in order to maintain data reliability and system performance. This repair process consumes system resources of disk I/O and network bandwidth. The replication scheme provides optimal repair performance, however, it results in a high repair cost.

DSSs which use erasure codes can achieve higher storage efficiency and better data reliability compared to systems using replication. The most widely adopted erasure codes in DSSs are the well known Reed-Solomon codes (RS codes), due to their optimal coding property (Maximal Distance Separable (MDS) property). Although RS codes can achieve great improvements on the storage performance of DSSs compared with replication schemes, they also suffer from high repair cost for a failed storage node. In recent years, locally repairable codes (LRCs) [4] were proposed in order to improve the repair efficiency. LRCs have attracted a lot of attention in the coding theory community. LRCs can greatly impove the repair efficiency of DSSs while maintaining relatively high system storage efficiency and system reliability performance. Much research has been devoted to the theoretical bounds and constructions of LRCs in recent years. Many researchers have made great achievements on LRCs in both theoretical study and system application.

LRCs introduced the concept of locality to improve the repair efficiency of DSSs compared with conventional erasure codes in DSSs. For an

[n, k, d]

linear code, a code symbol with locality r indicates that it can be repaired by at most r other code symbols. For conventional erasure codes, such as

[n, k]

RS codes with length n and dimension k, adopted in DSSs, repairing a failed symbol requires reading k other symbols [5]. LRCs with

r ≪ k

[4] can achieve a great reduction in the repair cost.

Figure 1 illustrates a comparison of [

n = 8, k = 4

] RS code and [

n = 8, k = 4, r = 2

] LRC [1]. The [

n = 8, k = 4

] RS code has four data (information) blocks {

c_{1}, c_{2}, c_{3}, c_{4}

} and four parity blocks {

c_{5}, c_{6}, c_{7}, c_{8}

}. Based on coding theory of RS codes,

k = 4

blocks are required to be fetched for repairing a failed block.

Thus, the repair cost of the RS code is

4 \times

. For the [

n = 8, k = 4, r = 2

] code in [1], as shown in Figure 1, there are two kinds of parity blocks: Local parity blocks {

c_{5}, c_{6}

} and global parity blocks {

c_{7}, c_{8}

}. Each local parity block is in a single repairable group and is generated from the data blocks in its reparable group.

c_{1}, c_{2}

and

c_{5}

form a repairable group and local parity

c_{5}

is computed from

c_{1}, c_{2}

.

c_{3}, c_{4}

and

c_{6}

form a repairable group and local parity

c_{6}

is obtained from

c_{3}, c_{4}

. The global parity blocks {

c_{7}, c_{8}

} are generated from all the data blocks {

c_{1}, c_{2}, c_{3}, c_{4}

}. When repairing a failed block (e.g.,

c_{3}

), instead of reading 4 other blocks in the repair of RS codes, the LRC is more efficient since it reads only

r = 2

blocks [

c_{4}, c_{6}

] in its group to reconstruct

c_{3}

. Therefore, the repair cost of the LRC with

r = 2

is

2 \times

. We can see that the repair cost of conventional RS codes depends on the code dimension k, while the locality r of LRCs improves the repair cost of failed storage blocks. Therefore, bounds and constructions of LRCs have attracted interest from many researchers.

In this paper, the codeword of a q-ary

[n, k, d]

linear code C is denoted as

c = (c_{1}, c_{2}, \dots, c_{n})

[5]. If a code symbol

c_{i} (1 \leq i \leq n)

can be repaired by at most r other code symbols, then this symbol is considered to have locality r. It is not hard to see that a code symbol with locality

r ≪ k

implies that the repair cost of this code symbol is lower than that of

[n, k]

RS codes. Gopalan et al. [4] considered

[n, k, d]

LRCs with locality r for all information symbols and obtained the following well-known Singleton-type bound,

d \leq n - k - ⌈\frac{k}{r}⌉ + 2 .

(1)

Optimal codes achieving the above bound have been proposed in [6,7]. Moreover, as to LRCs with locality r for all the code symbols, Cadambe et al. [8] proposed an improved theoretical bound. This upper bound takes the field size q into consideration,

k \leq min_{s \in Z_{+}} [s r + k_{opt}^{(q)} (n - s (r + 1), d)],

(2)

where

s \leq min \{⌈\frac{n}{r + 1}⌉, ⌈\frac{k}{r}⌉\}

, and the largest possible dimension of a q-ary linear code with length n, distance d is denoted by

k_{o p t}^{(q)} (n, d)

. It is clear that this bound (2) proposed in [8] is better than bound (1), especially when the size q is small.

With the further research on LRCs with locality r, it is discovered that locality r has some limitations in case of a multiple-node failure issue. Consider for example, if two blocks

c_{3}, c_{4}

in a repairable group are lost in Figure 1, then the system can not use the local repair method to repair the failed blocks. Because there is only one block

c_{6}

available in this repairable group. To deal with the multiple-node failure issue [9], the concept of availability t is introduced to support t disjoint available local repairable groups. A code symbol with availability t indicates that there exist t disjoint choices of local repairable groups to repair the failed storage code. Hence, a code symbol with availability t can be locally repaired even there are

t - 1

node failures. Many recent works on LRCs focused on the study of availability-

(r, t)

, which is the key to fault-tolerance in coding theory, system reliability and computation architecture [10,11,12]. In this paper, our works focus on single-parity LRCs with t available disjoint repairable groups (availability

t \geq 1

).

Organization

Section 1 shows the concept of locality r and availability t (or availability-

(r, t)

of LRCs; gives a comparison of RS codes and LRCs with the same [

n, k

] in order to discuss the reduction of repair cost of LRCs compared with RS codes.

Section 2 presents some related work. In Section 3, we introduce the main storage redundancy schemes of DSSs and the formal definitions of

(n, k, r, t)

LRCs, together with our contribution. In Section 4, we study single-parity LRCs on the basis of the generator matrices with the standard form under the condition of availability

t \geq 1

, and give the novel proof of the bound (3) and some necessary conditions of optimal codes. Section 5 discusses the scalability of the novel characterization by considering local multiple-parity case of LRCs. Section 6 concludes the paper.

2. Related Work

To reduce the repair cost of conventional erasure codes like RS codes, a new family of erasure codes, called Regenerating codes [13], has been presented and formed a new field in coding theory. Regenerating codes focus on the cost of network bandwidth during the repair process in DSSs. Regenerating codes introduce the concept of the degree of helpers and sub-packetization to improve the repair cost of network bandwidth [13,14,15,16,17], while LRCs consider the concept of locality r and availability t (or availability-

(r, t)

) to reduce the repair cost of failed nodes in DSSs. An overview of erasure codes including LRCs with availability-

(r, t)

for distributed storage was given in [18].

Recently, a novel method motivated by parity-check matrix for LRCs was proposed by Hao and Xia [19], where a unified proof of bound (1) and (2) was provided. Wang and Zhang [20] presented a Singleton-type bound for LRCs with

(n, k, r, t)

. Constructions of

(n, k, r, t)

LRCs were considered in [21,22], and the binary constructions were proposed by Su [23] and Wang et al. [24]. Here, we propose a novel proof of the theoretical bounds of

(n, k, r, t)

LRCs motivated by a novel characterization of the standard form of generator matrices.

Lu et al. [25] studied a special case of single-parity LRCs with a single reparable group. Specifically, it just considered the case

t = 1

. While here we focus on LRCs with multiple reparable groups and study the general case of availability

t \geq 1

.

3. Background

DSSs introduce some modern storage strategies to have high data reliability. In this section, we give the introduction of replication schemes, erasure codes, and we focus on LRCs with parameters

(n, k, r, t)

. The notations are shown in Table 1.

3.1. Replication Scheme

It is quite clear that a replication scheme is the simplest storage strategy for large distributed storage systems. Replication schemes have some obvious advantages, including easy implementation and flexible deployment, while they provide some obvious disadvantages, including the fact that replication has a high storage cost because the system storage cost increases significantly based on specific replication settings, with the rapid growth of data stored in DSSs. More clearly, DSSs make replication of a file into many replicas based on given replication ratio (factor) [3].

3.2. Erasure Codes

In DSSs, erasure codes are a better choice compared to a replication storage strategy for data reliability and system storage efficiency. As for erasure codes, all files need to be coded and then stored in DSSs. HDFS Erasure Coding [26] and HDFS-RAID [27], based on HDFS architecture, provide typical erasure codes for the reduction of storage overhead. HDFS Erasure Coding uses Intel ISA-L library [28] to implement erasure codes in order to improve the coding and repairing performance. Specifically, a DSS with an

[n, k]

erasure code needs to divide a file into k data blocks (information blocks), which are then encoded

n - k

coded blocks (parity blocks). If a block is lost, the system needs to repair (regenerate) it for maintaining system reliability by fetching other information or parity blocks, based on coding theory of the

[n, k]

erasure code.

For Reed-Solomon codes (RS codes), due to their MDS property, a DSS with an

[n, k]

RS code has to read and transfer k information/parity blocks to regenerate (repair) a failed block. However, a DSS with a replication scheme needs to fetch only one data block for repairing the lost block. We can see that the repair process of erasure codes needs higher network and disk I/O cost, compared to a simple replication scheme.

To improve the repair efficiency of conventional erasure codes, LRCs are introduced in DSSs. The aim of the application of LRCs is to reduce the repair cost of DSSs, as well as to engender high storage efficiency and data reliability.

3.3. Locally Repairable Codes

LRCs consider the concept of locality, denoted by r (

r < k

), in order to improve the repair efficiency of conventional erasure codes with parameters

n, k

. To be specific, LRCs in DSSs [1] use locality r to generate local parity blocks and form local repairable groups. If a block in some local repairable group is lost, the system can use r other blocks in its local repairable group to regenerate it. Thus, it is clear that LRCs with locality r reduce the network and disk I/O cost during the repair process. Distributed storage systems like Windows Azure Storage [1] have adopted LRCs to provide high repair performance, storage efficiency, and data reliability. With the development of theory research and system application of LRCs, another important concept of availability t has been presented to provide many repair choices for failed blocks and then improve local repair performance. A lot of works on LRCs with availability-

(r, t)

or similar concepts are proposed in [9,29,30]. Comparisons on the repair cost and storage cost of the three different redundancy schemes, incluing replications, RS codes and LRCs, are given in Table 2.

Here we firstly introduce the concept of single-parity, and availability-

(r, t)

of LRCs. Then the formal definitions of

(n, k, r, t)

LRCs and single-parity LRCs are presented.

Single-parity in LRCs denotes the property that each local repairable group has exactly one parity symbol [9]. As for availability-

(r, t)

of LRCs, if there exist t local disjoint reparable groups, in which at most r other symbols can reconstruct (repair) a code symbol, we say that this code symbol possesses availability-

(r, t)

. LRCs with length n, dimension k, locality r and availability t (

(n, k, r, t)

LRCs for simplicity), are defined as LRCs with all information symbols having availability-

(r, t)

. Let

[n] = {1, 2, \dots, n}

. Here we then let

{c_{o}, o \in [k]}

denote k information symbols of an

(n, k, r, t)

LRC and

{c_{p}, p \in [n]}

denote its n symbols.

The formal expression is as follows:

Definition 1.

[9,29] An

(n, k, r, t)

LRC has its k information symbols

{c_{o}, o \in [k]}

satisfying the following conditions:

∃t subsets $Ψ_{1} (o), \dots, Ψ_{t} (o) \subset [n] ∖ {o}$ , such that the code symbols denoted by $Ψ_{p} (o), p \in [t]$ can reconstruct the symbol $c_{o}$ .
∀ $p \in [t]$ , that $| Ψ_{p} (o) | \leq r$ .
∀ $p \neq q \in [t]$ , that $Ψ_{p} (o) \cap Ψ_{q} (o) = \emptyset$ .

There was a lot of research work which focused on

(n, k, r, t)

LRCs. Wang and Zhang [20] proposed a Singleton-type bound for

(n, k, r, t)

LRCs. Constructions of

(n, k, r, t)

LRCs can be found in [21,22]. Wang et al. [24] and Su [23] considered the constructions of binary LRCs. Here, we propose a novel proof of the theoretical bounds of

(n, k, r, t)

LRCs motivated by a novel characterization of the standard form of generator matrices.

The following is the definition of single-parity LRCs.

Definition 2.

Single-parity LRCs are considered as

(n, k, r, t)

LRCs with exactly one parity symbol in each repairable group.

In this paper, we focus on single-parity LRCs with availability-

(r, t)

for all information symbols. As shown in [9], Rawat et al. considered single-parity LRCs, and then obtained the following achievable bound on code minimum distance,

d \leq n - k - ⌈\frac{k t}{r}⌉ + t + 1 .

(3)

Here we focus on the study of single-parity LRCs. More specifically, we study single-parity LRCs with availability

t > 1

.

3.4. Contribution

For the discussion of single-parity LRCs, we have the following contributions:

Motivated by a novel perspective on generator matrices with the standard form, we propose a novel characterization of single-parity LRCs.
Based on this novel characterization, we give a simple novel proof of the bound (3).
Some necessary conditions of the optimal codes are proposed based on the new proof, which might provide some guidelines for the optimal code constructions.
Our novel characterization has high scalability and can be applied to local multiple-parity case of LRCs.

4. Single-Parity LRCs

This section proposes a novel characterization of single-parity LRCs based on the standard form of generator matrices. A novel proof of the bound (3) and some necessary conditions on the structure of optimal single-parity LRCs are obtained.

4.1. Terminology

Let G be the corresponding generator matrix of a linear q-ary

[n, k, d]

code C. Suppose G has the standard form

G = [I_{k}, g_{1}, g_{2}, \dots, g_{n - k}]

. It is clear that information symbols of code C can be represented as the columns in

I_{k}

of the generator matrix G, while parity symbols of code C can be denoted as the columns

{g_{i}, i \in [n - k]}

of G. Consider a vector

c

with non-zero elements, let weight

w_{H} (c)

denote the number of non-zero elements, and

supp (c)

denotes the support of

c

which refers to the set of the corresponding indices.

4.2. Novel Characterization

Let code C denote single-parity

(n, k, r, t)

LRCs with length n, dimension k, availability-

(r, t)

and its generator matrix G. Let G be the standard form and

c_{i} (1 \leq i \leq k)

be the information symbols of C. Each information symbol

c_{i}

of C has t repairable groups, and each of these groups has at most r code symbols. The t parity symbols in each of these groups are represented as

g_{j_{1}}, \dots, g_{j_{t}}

.

The following proposition holds for single-parity LRCs.

Proposition 1.

The generator matrix of a single-parity LRC must have the standard form in which each information symbol

{c_{i}, i \in [k]}

meets the following conditions:

∃t parity symbols $g_{j_{1}}, \dots, g_{j_{t}}$ , where ${j_{m} \in [n - k]}$ , such that $i \in supp (g_{j_{m}})$ for any $m \in [t]$ .
$w_{H} (g_{j_{m}}) \leq r$ for any $m \in [t]$ .
$supp (g_{j_{m}}) \cap supp (g_{j_{l}}) = {i}$ for any $m \neq l \in [t]$ .

Proof.

Consider the first repairable group of the i-th information symbol which contains a single parity symbol

g_{j_{1}}

, all the other code symbols in this repairable group are information symbols. Suppose that there are

\hat{r}

information symbols in this repairable group. Each of these information symbols corresponds to the a column in

I_{k}

which has weight one. Since the parity symbol

g_{j_{1}}

is a linear combination of these

\hat{r}

information columns, we have that

w_{H} (g_{j_{1}}) \leq r

. Similarly, for

2 \leq m \leq t

, we can obtain that

w_{H} (g_{j_{m}}) \leq r

. Since the t repairable groups for the i-th information symbol are pairwise disjoint, it follows that

supp (g_{j_{m}}) \cap supp (g_{j_{l}}) = {i}

, for any

m \neq l \in [t]

. An information symbol

{c_{i}, i \in [k]}

with availability-

(r, t)

must satisfy the above three conditions. Since all the k information symbols satisfy availability-

(r, t)

, the conclusion follows. □

In the following of this paper, we will call the columns with weight at most r in G as locality-columns or locality-symbols. As to a fixed code symbol of a single-parity LRC, every t local repairable groups have to be disjoint. Now let us relax this condition and present a novel proof of the theoretical bound (3).

Theorem 1.

For an

(n, k, r, t)

optimal single-parity LRC with all information symbols having t repairable groups (disjoint or not disjoint), each of which has size at most r and contains exactly one parity symbol, the minimum distance satisfies

d \leq n - k - ⌈ k t / r ⌉ + t + 1 .

Proof.

Note that

G = (I_{k}, B)

and the locality-symbols are in B. Now, we show how to select these locality-symbols. Let L be a set of coordinates, initialized by an empty set. For the first information symbol, we add its t locality-symbols into L. Then for the second information symbol, suppose it has already had

t_{2}

locality-symbols in L, where

0 \leq t_{2} \leq t

, we add its

t - t_{2}

locality-symbols into L. Repeat this process until all information symbols have at least t locality-symbols in L. Note that at the end of this process, some information symbols might have more than t locality-symbols in L. But there exists one information symbol which has exactly t locality-symbols in L. Suppose L contains l locality-symbols in total. Without loss of generality, on the basis of the realignment of the sequence of columns in G, it follows that,

G = [\overset{k}{\overset{︷}{I_{k}}}, \overset{l o c a l i t y - s y m b o l s}{\overset{︷}{g_{1}, \dots, g_{l}}}, \overset{o t h e r - s y m b o l s}{\overset{︷}{g_{l + 1}, \dots, g_{n - k}}}] .

(4)

Let

G_{L} = (g_{1}, \dots, g_{l})

and

Δ

be the number of nonzero elements in

G_{L}

. Firstly, let us count

Δ

from the view of rows. Since each information symbol

{c_{i}, i \in [k]}

belongs to t local repairable groups, every row of

G_{L}

has weight

\geq t

, we have

Δ \geq k t .

Then let us count

Δ

from the view of columns. Since

w_{H} (g_{j}) \leq r

, we obtain that

Δ \leq l r .

Hence, it follows that

k t \leq Δ \leq l r

, which implies

⌈\frac{k t}{r}⌉ \leq l \leq n - k .

(5)

Since there must exist a row with weight t in the submatrix

G_{L}

, and all rows in

I_{k}

have weight one and the weight of all rows in

[g_{l + 1}, \dots, g_{n - k}]

is no more than

n - k - l

. Then, there must exist a row with weight at most

1 + t + (n - k - l)

in G. Thus, it follows that

\begin{matrix} d & \leq & n - k - l + t + 1, \\ \leq & n - k - ⌈\frac{k t}{r}⌉ + t + 1, \end{matrix}

(6)

which completes the proof. □

The above proof employed a new characterization of single-parity LRCs based on the standard form of the generator metrices. The technique seems to be more straightforward to study single-parity LRCs. The following result can be naturally obtained from the proof.

Corollary 1.

For an optimal single-parity

(n, k, r, t)

LRC attaining the bound (3), let G denote the corresponding generator matrix with the standard form (4). The number of the locality-symbols in G is exactly

⌈\frac{k t}{r}⌉

. Furthermore, if

r ∣ k t

,

G_{L} = (g_{1}, \dots, g_{l})

is a regular

k \times \frac{k t}{r}

matrix, which has the uniform column weight r and uniform row weight t.

The above result is a necessary condition on the structure of optimal single-parity

(n, k, r, t)

LRCs attating the bound (3). Hence, it might provide guidelines for the constructions of optimal single-parity LRCs.

4.3. Illustration and Discussion

Now we give some illustrations of the new technique in this section by the following typical instances.

Example 1.

Given a

(17, 10, 4, 2)

single-parity LRC with the following generator matrix

where *’s denote nonzero entries of the generator matrix G. Thus, the number of the locality-symbols of G is

l = ⌈k t / r⌉ = 5

. Each row of the generator matrix G has weight 5. It is clear that

d \leq n - k - ⌈ k t / r ⌉ + t + 1 = 5

according to the bound (3).

This example considers multiple repairable groups with availability-

(r = 4, t = 2)

, and other coding parameters

n = 17, k = 10

. For an information symbol (e.g.,

c_{1}

), there exist

t = 2

(multiple) local repairable groups (

g_{1}

: [

c_{1}, c_{2}, c_{3}, c_{4}, c_{11}

];

g_{2}

: [

c_{1}, c_{5}, c_{6}, c_{7}, c_{12}

]), in which each local repairable group has

r = 4

other symbols (

g_{1}

: [

c_{2}, c_{3}, c_{4}, c_{11}

];

g_{2}

: [

c_{5}, c_{6}, c_{7}, c_{12}

]). According to this construction, it is clear that

c_{2}, c_{3}, c_{4}, c_{11}

can reconstruct

c_{1}

and

c_{5}, c_{6}, c_{7}, c_{12}

can also reconstruct symbol

c_{1}

. Thus, the repair cost of this construction is

(r = 4) \times

.

As to optimal coding constructions attaining the Singleton-type bound (1), the next result follows directly by Theorem 1.

Corollary 2.

Let G denote the standard form of generator matrix of an

(n, k, r)

optimal single-parity LRC attaining the bound (1). The number of the locality-symbols in G is equal to

⌈k / r⌉

. Additionally, we can see that

l = k / r

, when

r ∣ k

. The locality-columns

g_{1}, \dots, g_{l}

of G have the following properties. They have pairwise disjoint supports and the same column weight r.

Remark 1.

As shown in [4], Theorem 9 states that when

r < k

and

r ∣ k

, together with

d < r + 3

, every systematic

[n, k, d]

LRC with information locality r satisfies similar conditions in Corollary 2. Note that Corollary 2 does not require that

r ∣ k

and

d < r + 3

.

According to the above discussion, we can also see that the codes in [7] have the same structure in Corollary 2.

Example 2.

Given an

(n = 10, k = 6, r = 3)

single-parity LRC with the following generator matrix G,

where *’s denote nonzero entries of G. The number of the locality-symbols is

l = k / r = 2

. We can see that each row of the generator matrix G has weight four. It is clear that

d \leq n - k - ⌈ k / r ⌉ + 2 = 4

, according to the bound (1).

Note that this example considers single repairable group with locality

r = 3

. As to each information symbol (e.g.,

c_{2}

), there exists one local repairable group (

g_{1}

: [

c_{1}, c_{2}, c_{3}, c_{7}

]), and the local repairable group has

r = 3

other symbols (

g_{1}

: [

c_{1}, c_{3}, c_{7}

]). It is obvious that

c_{2}

can be reconstructed by

c_{1}, c_{3}

and

c_{7}

. In this example, the system with this construction has only one choice (

t = 1 -

availability) of local-repair strategy with low repair cost (

r = 3 -

locality) for each lost information symbol.

Remark 2.

The necessary conditions in Corollary 1 and Corollary 2 characterize the structures of generator matrices of optimal single-parity LRCs attaining the bound (3) and (1), respectively. Structured constructions based on these structure properties with relatively small q are more interesting.

The above discussions, in this paper, focus on the case of single-parity LRCs, which implies that the characterization considers LRCs of which each local repairable group contains exactly one parity symbol. Next, we will show that our methods can be generalized to study LRCs with local multiple-parity case.

5. Scalability of the Novel Characterization

In this section, we will show high scalability of this novel characterization by applying it to the case that each local repairable group has multiple parity symbols, which is usually denoted as

(r, δ)

-locality in other literatures.

5.1. Local Multiple-Parity Case

Prakash et al. in [31] considered “local-correction” codes. Their codes can locally-efficiently reconstruct a lost symbol by accessing at most r code symbols in case of

δ - 1

symbol failures since each information symbol is contained in an inner

δ

-distance code meeting the condition that

(r + δ - 1) \geq

length. This concept of locality is denoted as

(r, δ)

-locality for the lost code symbol. As to an

[n, k]

LRC with

(r, δ)

-locality for information symbols, Prakash et al. proposed the following Singleton-type bound

d \leq n - k - (⌈\frac{k}{r}⌉ - 1) (δ - 1) + 1

(7)

In fact, if we add a restriction that there are exactly

δ - 1

parity symbols in each local repairable group of an

[n, k]

LRC with

(r, δ)

-locality for information symbols, the above Singleton-type bound can also be derived by using the technique in this section.

5.2. Discussion and Proof

As discussed above, each information symbol is contained in an inner

δ

-distance code meeting the condition that the length is at most

r + δ - 1

. Here we suppose this inner code with length

n^{'}

, dimension

k^{'}

and minimum distance

d^{'}

. That is

n^{'} \leq (r + δ - 1)

and

d^{'} = δ

. Based on the Singleton bound

d^{'} \leq n^{'} - k^{'} + 1

, it is clear that

\begin{matrix} k^{'} \leq n^{'} - d^{'} + 1 \leq (r + δ - 1) - δ + 1 = r . \end{matrix}

(8)

Hence, the inner code contains at most r information symbols. In other words,

w_{H} (g_{j}) \leq r

, where

g_{j}

denotes a locality-symbol in

G_{L}

. Moreover, every local parity symbol in the same local repairable group has the same support. By the condition that there are exactly

δ - 1

parity symbols in each local repairable group, there are

δ - 1

locality-columns in G that have the same support.

Proof.

Now we select locality-symbols for all information symbols to construct the generator matrix with standard form. Let

Ω

be a set of coordinates of information symbols. Let L be a set of coordinates of parity symbols. Both

Ω

and L are initialized by empty sets.

For the first information symbol, there are

r^{'} - 1

(

1 \leq r^{'} \leq r

) other information symbols and

δ - 1

parity symbols to form an inner code (repairable group).

Without loss of generality, by interchanging the rows or the columns, let these

r^{'}

information symbols be the first

r^{'}

information symbols. Add the coordinates of this information symbol and these

r^{'}

information symbols to

Ω

. Let these

δ - 1

parity symbols be the first

δ - 1

parity symbols, which are denoted as

g_{1}, \dots, g_{δ - 1}

, and the first

r^{'} \leq r

components of these parity symbols are non-zero. Add the coordinates of these parity symbols to L. Then, for the first uncovered information symbol in

[k] ∖ Ω

, choose the corresponding information symbols and

δ - 1

parity symbols to

Ω

and L respectively. Repeat the above process for the remaining information symbols until no information symbol is left. Suppose that the resulting columns in L are

g_{1}, g_{2}, \dots, g_{l}

, i.e.,

G_{L} = (g_{1}, \dots, g_{l})

. Since the weight of each column in

G_{L}

is at most r, we can see that

\begin{matrix} ⌈\frac{k}{r}⌉ (δ - 1) \leq l \leq n - k . \end{matrix}

(9)

The submatrix containing the parity symbol in generator matrix has the following form

In the above generator matrix, *’s denote non-zero elements of

G_{L}

. Note that on condition that the weight of columns of

G_{L}

is at most r, notation

\tilde{0}

in

G_{L}

can be zero or not.

Note that the elements of the lower left corner of the submatrix of

G_{L}

are all zeros. The last row of

G_{L}

has weight at most

1 + (δ - 1) + (n - k - l)

. Therefore, based on (9), it follows that

\begin{array}{l} d & \leq (n - k - l) + (δ - 1) + 1, \\ \leq n - k - ⌈\frac{k}{r}⌉ (δ - 1) + δ - 1 + 1, \\ = n - k - (⌈\frac{k}{r}⌉ - 1) (δ - 1) + 1 \end{array}

(10)

which completes the proof. □

Remark 3.

Let C be an optimal LRC attaining the bound (7) and there are

δ - 1

parity symbols in each local repairable group of C, G, with its generator matrix with the standard form (4). It could be obtained that the number of the locality-symbols in G is exactly

⌈\frac{k}{r}⌉ (δ - 1)

.

5.3. Illustration

Example 3.

Given an

(n = 14, k = 8, r = 4)

LRC with

δ = 3

, its generator matrix G and locality-symbols matrix

G_{L}

respectively are

where *’s denote nonzero entries. The number of the locality symbols of

G_{L}

is

l = ⌈k / r⌉ (δ - 1) = 4

. It is clear that each row of the above generator matrix G possesses weight 5, and according to the bound (7), it follows that

\begin{matrix} d \leq n - k - (⌈k / r⌉ - 1) (δ - 1) + 1 = 5 . \end{matrix}

In this example, there are

δ - 1 = 2

locality-columns in

G_{L}

. That is, each repairable group has

δ - 1 = 2

local parities, which have the same support.

c_{i} (1 \leq i \leq 8)

denotes an information symbol, while

c_{j} (9 \leq j \leq 12)

represents a local parity symbol. There are two inner codes (repairable groups).

c_{1}

,

c_{2}

,

c_{3}

,

c_{4}

,

c_{9}

and

c_{10}

form one repairable group, where

c_{9}

and

c_{10}

with the same support are local parity symbols. While

c_{5}

,

c_{6}

,

c_{7}

,

c_{8}

,

c_{11}

and

c_{12}

form one group, where

c_{11}

and

c_{12}

with the same support are local parity symbols.

6. Conclusions

By considering the generator matrices with the standard form under the condition of availability

t \geq 1

, this paper focus on the study of LRCs with multiple local parities. The paper provides a novel proof of the theoretical bound (3). Motivated by the novel proof, we researched into optimal coding constructions with availability-

(r, t)

meeting the Singleton-like bounds and then obtained valuable results on necessary conditions of the optimal codes from the characterized generator matrices perspective. It is shown that this novel characterization has high scalability for LRCs by applying it to the local multiple-parity case. For the theoretical bound (3), we will take the corresponding optimal constructions into our future research work.

Author Contributions

Conceptualization, Y.L. and S.X.; Proof, Y.L. and S.X.; Methodology, Y.L., X.L. and S.X.; Writing Review, X.L.

Funding

This research was supported in part by the National Natural Science Foundation of China under grant No. 61371078, and the R&D Program of Shenzhen under grant Nos. JCYJ20140509172959977, JSGG20150512162853495, ZDSYS20140509172959989, JCYJ20160331184440545.

Acknowledgments

The authors would like to express our sincere gratefulness to the anonymous reviewers and the editor for the valuable and important suggestions and comments which greatly helped us to improve the presentation of this manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

Huang, C.; Simitci, H.; Xu, Y.; Ogus, A.; Calder, B.; Gopalan, P.; Li, J.; Yekhanin, S. Erasure coding in windows azure storage. In Proceedings of the 2012 USENIX Annual Technical Conference (USENIX ATC 12), Boston, MA, USA, 13–15 June 2012; pp. 15–26. [Google Scholar]
Sathiamoorthy, M.; Asteris, M.; Papailiopoulos, D.; Dimakis, A.G.; Vadali, R.; Chen, S.; Borthakur, D. Xoring elephants: Novel erasure codes for big data. In Proceedings of the VLDB Endowment, Riva del Garda, Italy, 26–30 August 2013; pp. 325–336. [Google Scholar]
Hadoop-HDFS Architecture. Available online: http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html (accessed on 6 June 2018).
Gopalan, P.; Huang, C.; Simitci, H.; Yekhanin, S. On the locality of codeword symbols. IEEE Trans. Inf. Theory 2012, 58, 6925–6934. [Google Scholar] [CrossRef]
MacWilliams, F.J.; Sloane, N.J.A. The Theory of Error-Correcting Codes (3rd Printing); North-Holland: Amsterdam, The Netherlands, 1981. [Google Scholar]
Tamo, I.; Barg, A. A family of optimal locally recoverable codes. IEEE Trans. Inf. Theory 2014, 60, 4661–4676. [Google Scholar] [CrossRef]
Huang, C.; Chen, M.; Li, J. Pyramid codes: Flexible schemes to trade space for access efficiency in reliable data storage systems. In Proceedings of the 6th IEEE International Symposium on Network Computing and Applications, Cambridge, MA, USA, 12–14 July 2007; pp. 79–86. [Google Scholar]
Cadambe, V.R.; Mazumdar, A. Bounds on the size of locally recoverable codes. IEEE Trans. Inf. Theory 2015, 61, 5787–5794. [Google Scholar] [CrossRef]
Rawat, A.S.; Papailiopoulos, D.S.; Dimakis, A.G.; Vishwanath, S. Locality and Availability in Distributed Storage. IEEE Trans. Inf. Theory 2016, 62, 4481–4493. [Google Scholar] [CrossRef]
Liu, T.; Chen, C.C.; Kim, W.; Milor, L. Comprehensive reliability and aging analysis on srams within microprocessor systems. Microelectron. Reliab. 2015, 55, 1290–1296. [Google Scholar] [CrossRef]
Zhang, F.; Zhai, J.; He, B.; Zhang, S.; Chen, W. Understanding co-running behaviors on integrated cpu/gpu architectures. IEEE Trans. Parallel Distrib. Syst. 2017, 28, 905–918. [Google Scholar] [CrossRef]
Liu, T.; Chen, C.C.; Cha, S.; Milor, L. System-level variation-aware aging simulator using a unified novel gate-delay model for bias temperature instability, hot carrier injection, and gate oxide breakdown. Microelectron. Reliab. 2015, 55, 1334–1340. [Google Scholar] [CrossRef] [Green Version]
Dimakis, A.G.; Godfrey, P.B.; Wainwright, M.J.; Ramchandran, K. Network Coding for Distributed Storage Systems. In Proceedings of the IEEE INFOCOM 2007, Anchorage, AK, USA, 6–12 May 2007; pp. 2000–2008. [Google Scholar]
Rashmi, K.V.; Shah, N.B.; Kumar, P.V. Optimal exact-regenerating codes for distributed storage at the msr and mbr points via a product-matrix construction. IEEE Trans. Inf. Theory 2011, 57, 5227–5239. [Google Scholar] [CrossRef]
Shah, N.B.; Rashmi, K.V.; Kumar, P.V.; Ramchandran, K. Interference alignment in regenerating codes for distributed storage: Necessity and code constructions. IEEE Trans. Inf. Theory 2012, 58, 2134–2158. [Google Scholar] [CrossRef]
Kamath, G.M.; Prakash, N.; Lalitha, V.; Kumar, P.V. Codes with local regeneration and erasure correction. IEEE Trans. Inf. Theory 2014, 60, 4637–4660. [Google Scholar] [CrossRef]
Gligoroski, D.; Kralevska, K.; Jensen, R.E.; Simonsen, P. Repair Duality with Locally Repairable and Locally Regenerating Codes. arXiv, 2017; arXiv:1701.06664. [Google Scholar]
Balaji, S.B.; Krishnan, M.N.; Vajha, M.; Ramkumar, V.; Sasidharan, B.; Kumar, P.V. Erasure coding for distributed storage: An overview. arXiv, 2018; arXiv:1806.04437, 2018. [Google Scholar] [CrossRef]
Hao, J.; Xia, S.-T. Bounds and constructions of locally repairable codes: Parity-check Matrix Approach. arXiv, 2016; arXiv:1601.05595. [Google Scholar]
Wang, A.; Zhang, Z. Repair locality with multiple erasure tolerance. IEEE Trans. Inf. Theory 2014, 60, 6979–6987. [Google Scholar] [CrossRef]
Pamies-Juarez, L.; Hollmann, H.D.L.; Oggier, F. Locally repairable codes with multiple repair alternatives. In Proceedings of the IEEE International Symposium on Information Theory (IEEE ISIT 2013), Istanbul, Turkey, 7–12 July 2013; pp. 892–896. [Google Scholar]
Huang, P.; Yaakobi, E.; Uchikawa, H.; Siegel, P.H. Linear locally repairable codes with availability. In Proceedings of the IEEE International Symposium on Information Theory (IEEE ISIT 2015), Hong Kong, China, 14–19 June 2015; pp. 1871–1875. [Google Scholar]
Su, Y.-S. On constructions of a class of binary locally repairable codes with multiple repair groups. IEEE Access 2017, 5, 3524–3528. [Google Scholar] [CrossRef]
Wang, A.; Zhang, Z.; Liu, M. Achieving arbitrary locality and availability in binary codes. In Proceedings of the IEEE International Symposium on Information Theory (IEEE ISIT 2015), Hong Kong, China, 14–19 June 2015; pp. 1866–1870. [Google Scholar]
Lu, Y.; Hao, J.; Xia, S.-T. On the Single-Parity Locally Repairable Codes. IEICE Trans. Fundament. Lett. 2017, E100-A, 1342–1345. [Google Scholar] [CrossRef]
HDFS Erasure Coding. Available online: http://hadoop.apache.org/docs/r3.1.0/hadoop-project-dist/hadoop-hdfs/HDFSErasureCoding.html (accessed on 7 June 2018).
HDFS-RAID. Available online: https://wiki.apache.org/hadoop/HDFS-RAID (accessed on 8 June 2018).
Intel(R) Intelligent Storage Acceleration Library. Available online: https://github.com/01org/isa-l/ (accessed on 9 June 2018).
Hao, J.; Xia, S.-T. Constructions of optimal binary locally repairable codes with multiple repair groups. IEEE Commun. Lett. 2016, 20, 1060–1063. [Google Scholar] [CrossRef]
Kralevska, K.; Gligoroski, D.; Øverby, H. Balanced locally repairable codes. In Proceedings of the International Symposium on Turbo Codes and Iterative Information Processing (ISTC 2016), Brest, France, 5–9 September 2016; pp. 280–284. [Google Scholar]
Prakash, N.; Kamath, G.M.; Lalitha, V.; Kumar, P.V. Optimal linear codes with a local-error-correction property. In Proceedings of the IEEE International Symposium on Information Theory (IEEE ISIT 2012), Cambridge, MA, USA, 1–6 July 2012; pp. 2776–2780. [Google Scholar]

Figure 1. Comparison of [

n = 8, k = 4

] Reed-Solomon (RS) code and [

n = 8, k = 4, r = 2

] locally repairable codes (LRC) [1].

Figure 1. Comparison of [

n = 8, k = 4

] Reed-Solomon (RS) code and [

n = 8, k = 4, r = 2

] locally repairable codes (LRC) [1].

Table 1. Notations.

Notations	Descriptions
n	code length
k	code dimension
d	code minimum distance
$c_{o}, o \in {1, 2, \dots, n}$	code symbols
$c_{p}, p \in {1, 2, \dots, k}$	information symbols
r	locality of locally repairable codes (LRC)
t	availability of LRC
Singleton bound	$d \leq n - k + 1$
G	generator matrix
$l o c a l i t y - c o l u m n s$ / $l o c a l i t y - s y m b o l s$	the columns with weight at most r in G
$w_{H} (\cdot)$	weight of vectors, the number of non-zero elements of vectors
$supp (\cdot)$	support of vectors, the set of the indices of non-zero elements

Table 2. Comparisons of Replication, RS Codes and LRC.

Storage Scheme	Repair Cost	Storage Cost
[f] Replication	1 ×	f ×
[ $n, k$ ] RS	k ×	$n / k$ ×
[ $n, k, r$ ] LRC	r ×	$n / k$ ×

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lu, Y.; Liu, X.; Xia, S. On the Single-Parity Locally Repairable Codes with Multiple Repairable Groups. Information 2018, 9, 265. https://doi.org/10.3390/info9110265

AMA Style

Lu Y, Liu X, Xia S. On the Single-Parity Locally Repairable Codes with Multiple Repairable Groups. Information. 2018; 9(11):265. https://doi.org/10.3390/info9110265

Chicago/Turabian Style

Lu, Yanbo, Xinji Liu, and Shutao Xia. 2018. "On the Single-Parity Locally Repairable Codes with Multiple Repairable Groups" Information 9, no. 11: 265. https://doi.org/10.3390/info9110265

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

On the Single-Parity Locally Repairable Codes with Multiple Repairable Groups

Abstract

1. Introduction

Organization

2. Related Work

3. Background

3.1. Replication Scheme

3.2. Erasure Codes

3.3. Locally Repairable Codes

3.4. Contribution

4. Single-Parity LRCs

4.1. Terminology

4.2. Novel Characterization

4.3. Illustration and Discussion

5. Scalability of the Novel Characterization

5.1. Local Multiple-Parity Case

5.2. Discussion and Proof

5.3. Illustration

6. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI