Design of Regenerating Code Based on Security Level in Cloud Storage System

Zhang, Fan; Xu, Jian; Yang, Gangqiang

doi:10.3390/electronics12112423

Open AccessArticle

Design of Regenerating Code Based on Security Level in Cloud Storage System

by

Fan Zhang

^1,†

,

Jian Xu

^2,*,† and

Gangqiang Yang

^3,*

¹

School of Information Engineering, Shandong Management University, Jinan 250357, China

²

School of Intelligent Engineering, Shandong Management University, Jinan 250357, China

³

School of Information Science and Engineering, Shandong University, Qingdao 226237, China

^*

Authors to whom correspondence should be addressed.

^†

Fan Zhang and Jian Xu contributed equally to this work and Jian Xu is the co-first author.

Electronics 2023, 12(11), 2423; https://doi.org/10.3390/electronics12112423

Submission received: 30 April 2023 / Revised: 23 May 2023 / Accepted: 25 May 2023 / Published: 26 May 2023

(This article belongs to the Special Issue Physical Layer Security in Future IoT Networks: Theories, Technologies, and Applications)

Download

Browse Figure

Versions Notes

Abstract

Cloud storage is an indispensable part of cloud computing solutions and the security of its stored data has become a key issue in the research and application of cloud storage systems. To solve this problem, this paper studies the anti-eavesdropping regenerating code technology for cloud storage systems, from the perspective of information theory. As opposed to the existing research ideas on regenerating code theory, that enable the system to obtain strong/weak security, this paper focuses on quantifying the relationship between security and system performance parameters, evaluating the system performance gains that can be obtained by appropriately reducing security, and designing regenerating code schemes with different information security levels to meet the personalized requirements of cloud storage customers. This paper puts forward a generalized matrix transposing method and applies it to the coding construction of fractional repetition codes. The scheme proposed in this paper will provide new ideas and methods for research on secure regenerating code technology in cloud storage systems.

Keywords:

cloud storage; regenerating code; security level; fractional repetition codes

1. Introduction

As an indispensable part of cloud computing solutions, cloud storage is a system that provides external data storage and business access functions, based on distributed storage systems (DSSs). Currently, both public and large private clouds such as OceanStore and Google Drive contain a large amount of sensitive and private data, worldwide. Adopting a DSS that stores data in different geographical locations can improve stability. Unfortunately, it can lead to more target attacks and increase the risk of personal sensitive data being eavesdropped. Therefore, information security in cloud storage is particularly important. According to statistics, there were over 1000 public data breaches worldwide in 2022, resulting in 4 billion personal information breaches. A means to improve data security, on the basis of ensuring storage efficiency, is a key issue in cloud storage applications.

Research has shown that regenerating code (RC) technology plays a crucial role in addressing information security issues in DSSs [1]. When there are eavesdroppers in the system, the regenerating code can prevent the eavesdroppers from restoring the original data. This blocking mechanism belongs to an information-theory-based data security protection scheme, which assumes that the eavesdropper is familiar with coding design principles and has infinite computing power. A way to use regenerating code technology to resist eavesdropping and ensure the data security of DSSs has become a challenging problem in the research of cloud storage systems in recent years [2].

Until now, scholars have proposed various regenerating code structures to ensure the security of DSSs. These structures mostly focus on achieving strong or weak security for DSSs under different system models and eavesdropping models. Strong security necessitates that, when there is eavesdropping in the system, no information about the stored data is leaked to the eavesdropper. Weak security allows for partial information leakage, on the premise that the original message symbols cannot be decoded. It should be noted that strong security and weak security are only two security attributes that DSSs may possess. Existing research has shown that the security level (SL) of a DSS can be further quantitatively characterized by a value between 0 and 1, defined as the probability that the system will prevent eavesdroppers from restoring the original data file [3]. In practice, different applications (or customers) may have different requirements for security levels, such as government cloud platforms, enterprise clouds and other public storage services. At the same time, appropriately reducing the security level can enable the system to obtain storage performance gains, thereby reducing the deployment costs [4].

Based on this motivation, this paper aims to study the regenerating code technology with different levels of information security in DSS. By quantitatively analyzing the relationship between the security level and system performance parameters, we evaluate the degree of system performance gains that can be achieved by appropriately reducing the security level. Thus, a regenerating code scheme that meets personalized security level requirements is proposed using fractional repetition (FR) codes and a generalized matrix transposing mechanism. The research in this paper provides a theoretical basis and coding implementation for building a new cloud storage system with high efficiency and security.

The content of this paper can be roughly divided into five sections. Section 2 reviews the current research in the field of secure regenerating codes. Section 3 introduces the implementation mechanism of FR coding and the system model used in this paper. Section 4 describes the encoding structure of the regenerating code proposed. Section 5 discusses the relationship between security level and system parameters. We conclude and summarize in Section 6.

2. Related Work

Since Dimakis et al. [5] first introduced the idea of network coding [6] into DSS and proposed regenerating code to solve the problem of high bandwidth consumption within erasure correcting code [7], the security of cloud storage system based on regenerating code has begun to receive extensive attention from researchers. Oliveira et al. [8] used a Vandermonde matrix to design codes to increase the security capacity of a system, which indicates that regenerating codes can improve the security of a storage system. The data security strategy based on regenerating codes belongs to information-theory security. A number of simple encoding and decoding methods has been developed, though they have not limited the computing power of eavesdroppers. Hence, regenerating code is suitable for promotion and application in DSSs [9].

In recent years, research on using regenerating codes to ensure the security of DSS can be roughly divided into two categories. The first category focuses on deriving the corresponding upper bound of security capacity for different system models and eavesdropping models, and providing a regenerating code scheme that can achieve this upper bound, enabling the DSS to enhance its security. For example, Pawar et al. [10] used the max-flow min-cut theorem of graph theory [11] to solve the problem of node eavesdropping in homogeneous storage systems, and further constructed an encoding structure that can achieve the upper bound of the security capacity at the minimum bandwidth regeneration (MBR) point, enabling the system to achieve strong security. Rashmi et al. [12] used product matrix (PM) theory [13] and FR codes [14] to develop encoding schemes that meet strong security at MBR and minimum storage regeneration (MSR) points, respectively. Rawat et al. [15] and Goparaju et al. [16] used the linear subspace analysis method to give a new upper bound estimate of security capacity at MSR points, and constructed a secure storage code based on the maximum rank distance (MRD) codes [17,18]. Tandon et al. [19] studied the tradeoff between secure storage capacity and repair bandwidth, and obtained an improved upper bound on the security capacity that general regenerating codes can achieve. A team from Shanghai Jiao Tong University [20] considered the problem of multidimensional and multi-level secure regenerating codes, providing security constraints that can obtain MBR points. A team from Shandong University [21] gave a strong security coding design under the generalized cloud storage model. A team from the National University of Defense Science and Technology [22] proposed the concept of stationary MSR codes and obtained a fixed upper bound of the security capacity in linear MSR code scenarios.

The main objective of the above studies is to meet the strong security of the system. Although strong security does not reveal any information about the original file, it requires the introduction of a large number of random keys in the data symbols. As a result, the storage capacity is sacrificed, which is costly for cloud storage providers. Based on this, the second research direction on using regenerating codes to ensure data security is weak secure regenerating codes. The weak security nature of the system allows eavesdroppers to obtain partial information from the original file. However, they cannot decode any meaningful information of a single symbol in the original file [23]. Essentially, weak security does not introduce random keys and does not cause loss of storage capacity. Regarding the research on weakly secure regenerating codes, Kadhe et al. [24,25] proposed two external encoding structures using nested codes that can weakly protect PM-MBR codes and PM-MSR codes against eavesdropping. For MSR codes, Kadhe et al. [26] further proposed a generalized weakly secure encoding to meet practical application scenarios, where external encoding can be designed independently of internal encoding. Liu et al. [27] designed two types of weakly secure regenerating code schemes against eavesdropping attacks by combining the all-or-nothing transformation and a precise repair regenerating code strategy. Xu et al. [28] designed a heterogeneous encoding scheme, that satisfies weak security constraints, to address the issue of anti-link-eavesdropping in heterogeneous DSS. The team from the Beijing University of Posts and Telecommunications [29] analyzed the block security of PM-MSR codes, based on the Cauchy matrix, and proposed an improved MSR coding scheme to achieve optimal weak security.

It can be seen that research on strong or weak secure regenerating codes has achieved a series of results in recent years. However, strong security and weak security are only two properties of DSS. The former is too strict and the latter is too lenient. In practice, different applications may have different requirements for security levels [30]. Based on this consideration, this paper put forward a novel regenerating code structure with different security levels. The application of such a coding structure in cloud storage systems can not only ensure the availability and repairability of data, but also meet the personalized requirements of security level for cloud storage customers.

3. FR Coding and System Model

In homogeneous DSS, there is a tradeoff between node storage α and repair bandwidth γ. The minimum storage and minimum bandwidth regenerating (MSBR) point refers to the point in the system where both α and γ reach the theoretical minimum values. If α_MSBR and γ_MSBR are labeled as node storage and repair bandwidth for MSBR points, then the parameters of an MSBR code need to meet the following constraints.

(α_MSBR, γ_MSBR) = (F/k, F/k),

(1)

where F is the original file size and k is the number of nodes in the data reconstruction set. From a practical application perspective, MSBR points are the most ideal scenario with optimal storage and repair bandwidth performance gains. Unlike the regular regenerating code (R-RC) model, the irregular regenerating code (I-RC) model allows a user (data collector, DC) to reconstruct a specific set of k nodes from any data to recover the original data file, while the replacement node of the faulty node can accurately repair the data from a specific set of d nodes in any data repair set. Previous studies have shown that the MSBR points are not obtainable in the R-RC model but can be achieved in the I-RC model [31].

We are interested in designing RC structures with different security levels at MSBR points, which are implemented based on FR coding.

3.1. FR Coding

FR codes are a type of MBR code that have the minimum repair bandwidth and an additional node repair property. It allows faulty nodes to achieve accurate repair by simply downloading data symbols from surviving nodes, which ensures that the entire system does not incur any computational cost to the surviving nodes during fault repair, and thereby greatly reduces the complexity of the repair process. FR code adopts a two-layer encoding principle. The data file is first encoded using an external maximal distance separable (MDS) code and then each encoding symbol is repeated a certain number of times, based on an internal FR code, and distributed across n storage nodes. Specifically, for a data file that will be stored with a size of F symbols, first encode the data file using [θ, F] MDS code, output θ (≥F) encoding symbols, and mark them with numbers 1–θ, respectively. Any F of the θ encoding symbols can recover F source information symbols. Define the set [θ] = {1–θ} and let N₁–N_n represent n subsets of the set [θ]. An (n, α, ρ) FR is a set containing n subsets {N₁–N_n} that satisfies

when 1 ≤ i ≤ n, then |N_i| = α;
each element in the set [θ] belongs to ρ (ρ ≥ 2) different subsets.

Parameters ρ denotes the repeatability. Note that these parameters meet the most basic constraint relationships as

nα = θρ.

(2)

The FR code defined above can be used for DSS with n storage nodes, each with a storage capacity of α. That is, each storage node corresponds to a determined subset of FR codes and stores the encoding symbols corresponding to the elements (numbers) in the corresponding subset.

3.1.1. Encoding Example

Example 1.

Here we use the example provided in reference [31] to illustrate the FR code structure, as shown in Figure 1. For a DSS, consider a certain FR code where the repeatability ρ = 2. The data file with a size of F = 4 is first encoded by using (6,4) MDS, outputting θ = 6 encoding symbols c₁–c₆, and then storing these six symbols distributed according to an (6,2,2) FR code on six storage nodes V_i (1 ≤ i ≤ 6), with each node storing α = 2 symbols. This FR code can be represented by a set containing six subsets, as {N₁–N₆} = {6,1}, {1,2}, {2,3}, {3,4}, {4,5}, {5,6}. By mapping each subset into a storage node, this defined (6,2,2) FR can be applied to a distributed storage system. That is, the i-th node V_i corresponds to the i-th subset N_i, and stores the encoded symbols indexed by the points in N_i.

3.1.2. The Data Recovery and Data Repair Properties of FR Code

Due to the external [θ, F] MDS code, any F of the θ encoding symbols can recover the original F message symbols. By designing an internal FR structure, a user (DC) can obtain at least F different encoding symbols from specific k nodes, thereby restoring the original F message symbols. Among them, the set of k nodes that can be used to recover the original message symbols is called the data recovery set, and the number of data recovery sets denoted by ξ_R is called the recovery selectivity [32].

Since each encoding symbol is copied at least twice, repairing a faulty node can be achieved by downloading another backup of the lost symbol from a specific set of d surviving nodes (i.e., data repair sets). Each of these d surviving nodes will accurately pass one data symbol to repair faulty nodes. Among them, the set of d nodes that help repair the faulty node V_i is called the data repair set. Similarly, we refer to the number of data repair sets L_i as repair selectivity.

Here, we use the example in Figure 1 to illustrate the properties of data recovery and data repair for FR codes. For data recovery, the recovery selectivity ξ_R = 9 indicates that there are nine data recovery sets K(i) (1 ≤ i ≤ 9) containing k = 2 nodes, listed as K(1) = {V₁, V₃}, K(2) = {V₁, V₄}, K (3) = {V₁, V₅}, K(4) = {V₂, V₄}, K(5) = {V₂, V₅}, K(6) = {V₂, V₆}, K(7) = {V₃, V₅}, K(8) = {V₃, V₆}, K(9) = {V₄, V₆}. A user (DC) can connect to any of these nine data recovery sets and download data stored on k = 2 nodes to recover the original data file with a size of F = 4 message symbols.

For data repair, assuming node V₂ fails and the encoding symbols, c₁ and c₂, are lost, its data repair set is D₂ = {V₁, V₃}, which includes d = 2 nodes, V₁ and V₃. The faulty node, V₂, can download symbols c₁ and c₂ from its data repair set D₂ for two nodes, V₁ and V₃, respectively, to repair missing symbols. The data repair set D_i for each node V_i (1 ≤ i ≤ 6) is summarized as D₁ = {V₆, V₂}, D₂ = {V₁, V₃}, D₃ = {V₂, V₄}, D₄ = {V₃, V₅}, D₅ = {V₄, V₆}, D₆ = {V₅, V₁}.

3.2. System Model

In the I-RC model for specified data recovery and repair requirements, a DSS can be formally defined as:

DSS (n, k, d, ξ_R, L_i) is a distributed storage system consisting of n storage nodes, meeting the following requirements

Define collection Ҡ = {K(1)–K(ξ_R)} consists of ξ_R data recovery sets K(i), where parameter k = |K(i)| (1 ≤ i ≤ ξ_R). The original data is able to be recovered from any data recovery set in Ҡ.
For a faulty node V_i (1 ≤ i ≤ n), define the set Ð_i= {D_i(1), …, D_i(L_i)} is composed of the Li data repair sets D_i(j) of the faulty node V_i, where d = |D_i(j)| (1 ≤ j ≤ L_i). The data symbols lost by the faulty node V_i can be accurately retrieved from any repair set in Ð_i.

Number of data recovery sets ξ_R = |Ҡ| and the number of data repair sets L_i = |Ð_i| theoretically satisfy

1 < ξ_{R} \leq (\begin{matrix} n \\ k \end{matrix}), 1 < L_{i} \leq (\begin{matrix} n - 1 \\ d \end{matrix}) .

(3)

Here, the parameter L_i is called repair selectivity, which characterizes the flexibility of repairing faulty nodes. In actual distributed storage systems, faults are sometimes associated and two or more nodes may fail simultaneously. Therefore, we strongly hope that the system can repair a faulty node not only through one data repair set, but also from two or more data recovery sets. In addition, we refer to the parameter ξ_R as recovery selectivity, which characterizes the flexibility of recovering data files. We hope that the value of ξ_R is large enough [31].

We use the example in Figure 1 to illustrate. For data recovery, the degree of recovery selectivity is ξ_R = 9, with nine data recovery sets, each containing k = 2 nodes. The data recovery sets are listed as Ҡ = {K(1) = {V₁, V₃}, K(2) = {V₁, V₄}, K(3) = {V₁, V₅}, K(4) = {V₂, V₄}, K(5) = {V₂, V₅}, K(6) = {V₂, V₆}, K(7) = {V₃, V₅}, K(8) = {V₃, V₆}, K(9) = {V₄, V₆}}. A user (DC) connects to the k = 2 nodes in any recovery set in Ҡ can recover data files with a size of F = 4 symbols.

Considering the presence of an eavesdropper in the system capable of stealing information from k nodes, then the security level (SL) increases as ξ_R decreases [21]. Here, SL can be defined as

SL = \frac{(\begin{matrix} n \\ k \end{matrix}) - ξ_{R}}{(\begin{matrix} n \\ k \end{matrix}) - 1} .

(4)

According to (3), the range of SL value is 0 ≤ SL ≤ 1.

When

ξ_{R} = (\begin{matrix} n \\ k \end{matrix})

, it indicates that, in the R-RC model, any k nodes out of n can recover the original data file. In this case, eavesdroppers can steal any k nodes to obtain the original file, and the system has no security and SL = 0. When ξ_R = 1, there is a unique data recovery set in the system and eavesdroppers can only recover the original files by eavesdropping on k nodes in this unique data recovery set. By eavesdropping on any other k nodes, the original data files cannot be leaked. Hence, the system has the highest level of security, i.e., SL = 1. Compared to the R-RC model, SL can be seen as the gain in security (i.e., the degree of SL improvement) that the I-RC model has.

In addition to SL, the code rate R_C is also an important performance parameter, defined as the ratio between the number of source data symbols in the storage system and the total number of encoded symbols [32]. That is,

R_{C} = \frac{F}{n α} .

(5)

Since ρ ≥ 2 and θ ≥ F, according to (2), R_C ≤ 1/2 can be obtained. In this way, the performance parameters involved in the encoding can be labeled as a set {θ, ρ, ξ_R, SL, L_i, R_C}. For the example in Figure 1, we have {θ, ρ, ξ_R, SL, L_i, R_C} = {6, 2, 9, 42.86%, 1, 1/3}.

4. Encoding Structure

We use an irregular matrix transposition method and FR code to achieve encoding structures that meet different security levels at MSBR points.

4.1. Irregular Matrix Transposition Method

The irregular matrix transposition theory proposed in this chapter is inspired by the RSKR repeat code in [10], which sequentially fills the storage nodes by repeating the data stored in the horizontal direction (i.e., on a single node) and vertically (i.e., across all nodes). Our coding structure is to group a total of n nodes in the system and store symbols in two adjacent groups using the following irregular matrix transposition method.

Definition 1.

For a given matrix C_{a × b} of size (a × b) (b ≤ a)

C_{a \times b} = [\begin{matrix} c_{1} & c_{2} \dots & c_{b} \\ c_{b + 1} & c_{b + 2} \dots & C_{2 b} \\ . & . & . \\ c_{(a - 1) b + 1} & c_{(a - 1) b + 2} \dots & c_{a b} \end{matrix}]

(6)

Its irregular transposed matrix (C_a _× _b)^Ŧ by

{(C_{a \times b})}^{Ŧ} = [\begin{array}{l} c_{1} & c_{a + 1} & \dots & c_{(b - 1) a + 1} \\ c_{2} & c_{a + 2} & \dots & c_{(b - 1) a + 2} \\ . & . & . \\ c_{b} & c_{a + b} & \dots & c_{(b - 1) a + b} \\ c_{b + 1} & c_{a + b + 1} & \dots & c_{(b - 1) a + b + 1} \\ . & . & . \\ c_{a} & c_{2 a} & \dots & c_{b a} \end{array}]

(7)

where the size of (C_a _× _b)^Ŧ is also (a × b). The irregular transposed matrix has two important properties as follows:

When b = a, the irregular transposed matrix (C_a _× _b)^Ŧ of C_a _× _b becomes its transposed matrix (C_a _× _b)^T. That is, if b = a, then (C_a _× _b)^Ŧ = (C_a _× _b)^T.
The b elements of each row of C_a _× _b (or (C_a _× _b)^Ŧ) (b ≤ a) can be found in some different b rows (one element per row) of (C_a _× _b)^Ŧ (or C_a _× _b).

In this paper, this irregular matrix transposition method is referred to as generalized matrix transposition.

4.2. Encoding Structure Based on Generalized Matrix Transposition Method

Consider a DSS (n, k, d, ξ_R, L_i) using [θ, F] MDS-(n, α, ρ) a FR code. Let the size q of a finite field F_q be a prime number that satisfies q ≥ F. In this section, we provide a determined FR code structure that can obtain MSBR points. This means that the parameters we encode satisfy Equation (1), so our encoding is optimal. Note that the FR code provided in this paper is a type of MSBR code. Therefore, the parameters in our encoding structure satisfy

(F = kα, α = γ, β = 1)

(8)

where γ = dβ and β represents the number of symbols of data downloaded from each helper node in order to repair the data. The surviving node participating in the repair of a faulty node is called a helper node. The actual storage system prefers encoding with smaller d since data repair can cause a certain amount of workload to helper nodes. Hence, in our coding structure, d ≤ k is always satisfied and let ƙ = θ/α be an integer. Because θ ≥ F, we then have α = d ≤ k ≤ ƙ, according to (8). Divide n nodes in the system into ρ groups with ƙ nodes in each group satisfying n = ρƙ.

If each node stores α symbols, then the total storage capacity of each group containing ƙ nodes is θ = ƙα different coding symbols. In this way, the θ symbols are repeatedly stored ρ times on these n nodes. Here, the parameters ρ not only represent the number of groups, but also the repeatability of our coding. In actual DSS, ρ is usually a relatively small value. Furthermore, let the arrangement of θ symbols stored by ƙ nodes in each group be represented by a matrix C_{ƙ × α} with size (ƙ × α), or its generalized transposed matrix (C_{ƙ ×} _α)^Ŧ. Then, use a codeword matrix (C_ρ_{ƙ × α})^MSBR with size (ρƙ × α) to represent our code, which consists of a submatrix C_{ƙ × α} and a submatrix (C_{ƙ × α})^Ŧ alternately. The composition (total number of sub matrices is ρ) is as shown in the following equation.

(C_ρ_{ƙ × α})^MSBR = [C_{ƙ × α} (C_{ƙ × α})^Ŧ C_{ƙ × α} …]^T

(9)

where the i-th row of (C_ρ_g _{× α})^MSBR corresponds to node V_i (1 ≤ i ≤ n), and the i-th elements represent the α encoding symbols stored on node V_i.

Example 2.

This example has the same parameters as Example 1. Hence, ƙ = θ/α = 3. Choose q = 5, which means we operate on the finite field F₅. Fill matrix C_{3 × 2} with θ = 6 encoding symbols c₁–c₆ as

C_{3 \times 2} = [\begin{array}{l} c_{1} & c_{2} \\ c_{3} & c_{4} \\ c_{5} & c_{6} \end{array}]

(10)

Thus, its generalized transposed matrix is

{(C_{3 \times 2})}^{Ŧ} = [\begin{array}{l} c_{1} & c_{4} \\ c_{2} & c_{5} \\ c_{3} & c_{6} \end{array}]

(11)

According to (9), the codeword matrix (C_{6 × 2})^MSBR with size (6 × 2) is

{(C_{6 \times 2})}^{MSBR} = [\begin{array}{l} C_{3 \times 2} \\ {(C_{3 \times 2})}^{^{Ŧ}} \end{array}] = [\begin{array}{l} c_{1} & c_{2} \\ c_{3} & c_{4} \\ c_{5} & c_{6} \\ c_{1} & c_{4} \\ c_{2} & c_{5} \\ c_{3} & c_{6} \end{array}]

(12)

where the i-th row of (C_{6 × 2})^MSBR corresponds to node V_i (1 ≤ i ≤ 6), and α = 2 elements in the i-th row correspond to 2 encoding symbols stored in node V_i. For example, V₁ corresponds to the first row of (C_{6 × 2})^MSBR and stores the symbols c₁ and c₂.

4.3. Data Repair and Date Recovery Properties of the Proposed Encoding Structure

Regarding data recovery, any F = (kα) of θ encoding symbols is sufficient to recover the original data file of size F. Therefore, by connecting any k (≤ƙ) nodes in each group, one DC can recover this original data file. That is in each group, any k of ƙ nodes forms a data recovery set. In our coding structure, for a given set of k nodes, each node has a capacity of α. If any two nodes store different symbols, then these k nodes store a total of kα = F different symbols, which can recover the original data file with size F and achieve data recovery. These k nodes form a data recovery set. As for the number of data recovery sets denoted by ξ_R and data recovery collection denoted by Ҡ for a finite number of n nodes, it can be obtained through a traversal search algorithm [33].

In Example 2, it can be verified that ξ_R = 9. These 9 data recovery sets are listed as K(1) = {V₁, V₂}, K(2) = {V₁, V₃}, K(3) = {V₂, V₃}, K(4) = {V₄, V₅}, K(5) = {V₄, V₆}, K(6) = {V₅, V₆}, K(7) = {V₁, V₆}, K(8) = {V₂, V₅}, K(9) = {V₃, V₄}. For example, a DC can connect nodes V₁ and V₂ in the recovery set K(1) and obtain the 4 encoding symbols c₁–c₄ to recover the original data symbols with size F = 4.

Regarding data repair, according to property 2 of the irregular transposed matrix, the α elements stored in each row of matrix C_{ƙ × α} (or (C_{ƙ × α})^Ŧ) can be found in different α rows (one element per row) of (C_{ƙ × α})^Ŧ (or C_{ƙ × α}). Hence, any failed node in group C_{ƙ × α} (or (C_{ƙ × α})^Ŧ) can repair the missing α symbols through connecting α = d helper nodes in (C_{ƙ × α})^Ŧ (or C_{ƙ × α}) and download one symbol from each helper node. As shown in Example 2, if node V₁ in group C_{3 × 2} fails, the α = 2 symbols stored in node V₁ are c₁ and c₂. Its repair set is D₁ = {V₄,V₅}, composed of d = 2 nodes, V₄ and V₅, in group (C_{3 × 2})^Ŧ. Failed node V₁ can be repaired by connecting d = 2 nodes, V₄, V₅, and downloading symbols c₁ and c₂, respectiVely without additional calculations. It should be noted that ρ usually takes the Value of 2 or 3. According to the property 2 of the generalized transposed matrix, it is not difficult to deduce that the repairable selectiVity L_i of node V_i (1 ≤ i ≤ n) can be calculated by

if ρ = 2, L_i = 1;
if ρ = 3 and i ∊ C_{ƙ × α}, L_i = α + 1;
if ρ = 3 and i ∊ (C_{ƙ × α})^Ŧ, L_i = 2^α.

For node V_i (1 ≤ i ≤ 6) in Example 2, it is obviously that L_i = 1.

5. Performance Analysis

In the proposed encoding structure based on the generalized matrix transposition method, a new FR code can be obtained by simply adjusting the encoding parameters. This new FR code not only achieves MSBR points, but also has different performances. Specifically, when maintaining the parameters (k, d, α, β, F) unchanged, the values of parameters n, L_i, and R_C can be changed by adjusting the values of θ and ρ. The detailed process will be presented from the following three aspects.

1.: Firstly, maintain the value of ρ unchanged. If the value of θ is decreased, according (5) and the analysis above, the value of n decreases, the value of L_i remains unchanged and the value of R_C increases.
2.: Keep the value of θ unchanged. If the value of ρ is increased from 2 to 3, according (5) and the analysis above, the value of R_C decreases, the value of θ remains unchanged and the values of n and L_i increase.
3.: Change the values of ρ and θ simultaneously and keep the value of n (=ρθ/α) unchanged. Specifically, increase the value of ρ from 2 to 3, according to (5) and the analysis above; the value of θ decreases, the values of n and R_C remain unchanged and the value of L_i increases.

From the analysis above, it can be seen that, by changing some parameters in the encoding, FR codes with different performances can be obtained. In addition, according to formula (4), one can obtain different security levels by adjusting the value of ξ_R appropriately. Reducing the value of ξ_R will improve the security level, but it will also reduce the values of parameters n, ρ, and θ. Therefore, improving security levels comes at the cost of sacrificing storage performance. In order to quantitatively analyze the relationship between parameter ξ_R and the security level (SL), several coding examples are provided as shown in Table 1. In these coding examples, the values of n, ρ, and θ are changed to obtain different codewords with the same parameters (k, d, α, β, F) = (2, 2, 2, 1, 4). Through quantitative analysis, it can be seen that SL shows an increasing trend as ξ_R (expressed as a percentage of ξ_R/

(\begin{matrix} n \\ k \end{matrix})

) decreases.

For different cloud storage customers, one can adjust parameters to meet their personalized needs, which brings better flexibility to the application of regenerative codes.

6. Conclusions

This paper provides coding designs with different security levels that minimize both system storage consumption and repair bandwidth consumption. Considering the existence of an eavesdropper in the system that can steal information from k nodes, based on existing transpose matrix theory, a generalized matrix transposition method is proposed and applied to the coding construction of FR codes. Combined with the grouping design method, under condition n = ρƙ, α = d ≤ k ≤ ƙ, ƙ = θ/α, a determined optimal FR code structure that can simultaneously obtain minimum storage and minimum repair bandwidth is provided. By simply changing the encoding parameters, the provided FR code can have different security levels and other performance parameters, including the number of storage nodes, recovery selectivity, repair selectivity, and bit rate, which are attractive for practical dynamic storage systems. The FR codes provided with different security levels are determined, optimal RC structure that can simultaneously achieve minimum storage and minimum repair bandwidth. In addition, the coding design is based on the I-RC model. The trade-off between data recovery freedom and security level in the generalized model is provided, which has important guiding significance for practical system design.

Author Contributions

Conceptualization, F.Z. and J.X.; methodology, G.Y.; software, J.X.; validation, F.Z., G.Y. and J.X.; formal analysis, F.Z.; investigation, G.Y.; resources, J.X.; writing—original draft preparation, F.Z.; writing—review and editing, J.X.; supervision, F.Z.; project administration, F.Z. and J.X. All authors have read and agreed to the published version of the manuscript.

Funding

This work was jointly supported by the Major Basic Research Program of the Shandong Provincial Natural Science Foundation (ZR2020ZD01), in part by Research Program of the Shandong Provincial Natural Science Foundation (ZR2022QF033), the Doctoral Research Funds of Shandong Management University (Grant No. SDMUD201906) and the QiHang Research Project Funds of Shandong Management University (Grant No. QH2020Z01).

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflict of interest.

References

Elmahdy, A.; Kleckler, M.; Mohajer, S. Secure Determinant Codes for Distributed Storage Systems. IEEE Trans. Inf. Theory 2023, 69, 1966–1987. [Google Scholar] [CrossRef]
Lavauzelle, J.; Tajeddine, R.; Freij-Hollanti, R.; Hollanti, C. Private Information Retrieval Schemes with Product-Matrix MBR Codes. IEEE Trans. Inf. Forensics Secur. 2021, 16, 441–450. [Google Scholar] [CrossRef]
Gaeta, R. On the Impact of Pollution Attacks on Coding-Based Distributed Storage Systems. IEEE Trans. Inf. Forensics Secur. 2022, 17, 292–302. [Google Scholar] [CrossRef]
Holzbaur, L.; Kruglik, S.; Frolov, A. Secure Codes with Accessibility for Distributed Storage. IEEE Trans. Inf. Forensics Secur. 2021, 16, 5326–5337. [Google Scholar] [CrossRef]
Dimakis, A.G.; Godfrey, P.B.; Wu, Y.; Wainwright, M.J.; Ramchandran, K. Network coding for distributed storage systems. IEEE Trans. Inf. Theory 2010, 56, 4539–4551. [Google Scholar] [CrossRef]
Ahlswede, R.; Cai, N.; Li, S.-Y.R.; Yeung, R.W. Network information flow. IEEE Trans. Inf. Theory 2000, 46, 1204–1216. [Google Scholar] [CrossRef]
Rodrigues, R.; Liskov, B. High Availability in DHTs: Erasure Coding vs. Replication. In Proceedings of the 4th International Workshop on Peer-to-Peer Systems, New York, NY, USA, 24–25 February 2005. [Google Scholar]
Oliveira, P.F.; Lima, L.; Vinhoza, T.; Barros, J.; Médard, M. Coding for trusted storage in untrusted networks. IEEE Trans. Inf. Forensics Secur. 2012, 7, 1890–1899. [Google Scholar] [CrossRef]
Zhang, Z.; Zhou, L. A Vertical-Horizontal Framework for Building Rack-Aware Regenerating Codes. IEEE Trans. Inf. Theory 2023, 69, 2874–2885. [Google Scholar] [CrossRef]
Pawar, S.; Rouayheb, E.S.; Ramchandran, K. Securing dynamic distributed storage systems against eavesdropping and adversarial attacks. IEEE Trans. Inf. Theory 2012, 58, 6734–6753. [Google Scholar] [CrossRef]
Bondy, A.; Murty, U.S.R. Graph Theory; Springer: Berlin, Germany, 2011. [Google Scholar]
Rashmi, K.V.; Shah, N.B.; Ramchandran, K.; Kumar, P. Information-Theoretically Secure Erasure Codes for Distributed Storage. IEEE Trans. Inf. Theory 2018, 64, 1621–1646. [Google Scholar] [CrossRef]
Rashmi, K.V.; Shah, N.B.; Kumar, P.V. Optimal Exact-Regenerating Codes for Distributed Storage at the MSR and MBR Points via a Product-Matrix Construction. IEEE Trans. Inf. Theory 2011, 57, 5227–5239. [Google Scholar] [CrossRef]
Silberstein, N.; Etzion, T. Optimal Fractional Repetition Codes Based on Graphs and Designs. IEEE Trans. Inf. Theory 2015, 61, 4164–4180. [Google Scholar] [CrossRef]
Rawat, A.S.; Koyluoglu, O.O.; Silberstein, N.; Vishwanath, S. Optimal locally repairable and secure codes for distributed storage systems. IEEE Trans. Inf. Theory 2014, 60, 212–236. [Google Scholar] [CrossRef]
Goparaju, S.; Rouayheb, S.E.; Calderbank, R.; Poor, H.V. Data secrecy in distributed storage systems under exact repair. In Proceedings of the International Symposium on Network Coding, Calgary, AB, Canada, 7–9 June 2013. [Google Scholar]
Tamo, I.; Wang, Z.; Bruck, J. Zigzag codes: MDS array codes with optimal rebuilding. IEEE Trans. Inf. Theory 2013, 59, 1597–1616. [Google Scholar] [CrossRef]
Sengupta, B.; Dixit, A.; Ruj, S. Secure Cloud Storage with Data Dynamics Using Secure Network Coding Techniques. IEEE Trans. Cloud Comput. 2022, 10, 2090–2101. [Google Scholar] [CrossRef]
Tandon, R.; Amuru, S.; Clancy, T.C.; Buehrer, R.M. Toward optimal secure distributed storage systems with exact repair. IEEE Trans. Inf. Theory 2016, 62, 3477–3492. [Google Scholar] [CrossRef]
Shuo, S.; Tie, L.; Chao, T.; Cong, S. Multilevel Diversity Coding with Secure Regeneration: Separate Coding Achieves the MBR Point. Entropy 2018, 20, 751. [Google Scholar] [CrossRef]
Xu, J.; Cao, Y.; Wang, D. Generalised Regenerating Codes for Securing Distributed Storage Systems against Eavesdropping. J. Inf. Secur. Appl. 2017, 34, 225–232. [Google Scholar] [CrossRef]
Huang, K.; Parampalli, U.; Xian, M. On Secrecy Capacity of Minimum Storage Regenerating Codes. IEEE Trans. Inf. Theory 2017, 63, 1510–1524. [Google Scholar] [CrossRef]
Chen, J.; Sung, C.W. Weakly Secure Coded Distributed Computing with Group-based Function Assignment. In Proceedings of the IEEE Information Theory Workshop (ITW), Mumbai, India, 6–9 November 2022. [Google Scholar]
Kadhe, S.; Sprintson, A. Weakly secure regenerating codes for distributed storage. In Proceedings of the International Symposium on Network Coding, Aalborg Oest, Denmark, 27–28 June 2014. [Google Scholar]
Kadhe, S.; Sprintson, A. On a weakly secure regenerating code construction for minimum storage regime. In Proceedings of the 52nd Annual Allerton Conference on Communication, Control, and Computing, Monticello, IL, USA, 30 September–3 October 2014. [Google Scholar]
Kadhe, S.; Sprintson, A. Universally Weakly Secure Coset Coding Schemes for Minimum Storage Regenerating (MSR) Codes. In Proceedings of the 55th Annual Allerton Conference on Communication, Control, and Computing, Monticello, IL, USA, 3–6 October 2017. [Google Scholar]
Liu, J.; Wang, H.; Xian, M.; Huang, K. Weakly Secure Regenerating Codes for Cloud Storage against Eavesdropper. J. Electron. Inf. Technol. 2014, 36, 1221–1228. [Google Scholar]
Xu, J.; Cao, Y.; Wang, D.; Wu, C.; Yang, G. Optimal Heterogeneous Distributed Storage Regenerating Code at Minimum Remote-Repair Bandwidth Regenerating Point. ETRI J. 2016, 38, 529–539. [Google Scholar] [CrossRef]
Bian, J.; Luo, S.; Li, Z.; Yang, Y. Optimal Weakly Secure Minimum Storage Regenerating Codes Scheme. IEEE Access 2019, 7, 151120–151130. [Google Scholar] [CrossRef]
Dau, H.; Song, W.; Sprintson, A.; Yuen, C. Secure Erasure Codes with Partial Reconstructibility. IEEE Trans. Inf. Theory 2020, 66, 6809–6822. [Google Scholar] [CrossRef]
Yu, Q.; Sung, C.W.; Chan, T.H. Irregular Fractional Repetition Code Optimization for Heterogeneous Cloud Storage. IEEE J. Sel. Areas Commun. 2014, 32, 1048–1060. [Google Scholar] [CrossRef]
Zhu, B.; Shum, K.W.; Li, H.; Hou, H. General Fractional Repetition Codes for Distributed Storage Systems. IEEE Commun. Lett. 2014, 18, 660–663. [Google Scholar] [CrossRef]
Barnawi, A.; Alharbi, M. Performance evaluation and simulation of the traversal algorithms for robotic agents in advanced search and find (ASAF) system. Int. Arab. J. Inf. Technol. 2021, 18, 611–623. [Google Scholar] [CrossRef]

Figure 1. FR coding for DSS with n = 6, α = 2, ρ = 2.

Table 1. Encoding examples that indicate the trade-off between ξ_R and security level (SL).

Parameters	Codewords	$ξ_{R} / (\begin{matrix} n \\ k \end{matrix})$	SL
(n, ρ, θ) = (6, 2, 6)	{{c₁, c₂}, {c₃, c₄}, {c₅, c₆},	60%	42%
(n, ρ, θ) = (6, 2, 6)	{c₁, c₄}, {c₂, c₅}, {c₃, c₆}}	60%	42%
(n, ρ, θ) = (9, 3, 6)	{{c₁, c₂}, {c₃, c₄}, {c₅, c₆}, {c₁, c₄},	48%	60%
(n, ρ, θ) = (9, 3, 6)	{c₂, c₅}, {c₃, c₆}, {c₁, c₂}, {c₃, c₄}, {c₅, c₆}}	48%	60%
(n, ρ, θ) = (4, 2, 4)	{{c₁, c₂}, {c₃, c₄},	33%	80%
(n, ρ, θ) = (4, 2, 4)	{c₁, c₃}, {c₂, c₄}}	33%	80%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, F.; Xu, J.; Yang, G. Design of Regenerating Code Based on Security Level in Cloud Storage System. Electronics 2023, 12, 2423. https://doi.org/10.3390/electronics12112423

AMA Style

Zhang F, Xu J, Yang G. Design of Regenerating Code Based on Security Level in Cloud Storage System. Electronics. 2023; 12(11):2423. https://doi.org/10.3390/electronics12112423

Chicago/Turabian Style

Zhang, Fan, Jian Xu, and Gangqiang Yang. 2023. "Design of Regenerating Code Based on Security Level in Cloud Storage System" Electronics 12, no. 11: 2423. https://doi.org/10.3390/electronics12112423

APA Style

Zhang, F., Xu, J., & Yang, G. (2023). Design of Regenerating Code Based on Security Level in Cloud Storage System. Electronics, 12(11), 2423. https://doi.org/10.3390/electronics12112423

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Design of Regenerating Code Based on Security Level in Cloud Storage System

Abstract

1. Introduction

2. Related Work

3. FR Coding and System Model

3.1. FR Coding

3.1.1. Encoding Example

3.1.2. The Data Recovery and Data Repair Properties of FR Code

3.2. System Model

4. Encoding Structure

4.1. Irregular Matrix Transposition Method

4.2. Encoding Structure Based on Generalized Matrix Transposition Method

4.3. Data Repair and Date Recovery Properties of the Proposed Encoding Structure

5. Performance Analysis

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI