Next Article in Journal
Women in Parliaments and Environmentally Friendly Fiscal Policies: A Global Analysis
Previous Article in Journal
Furnace Temperature Model Predictive Control Based on Particle Swarm Rolling Optimization for Municipal Solid Waste Incineration
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Review

Blockchain-Based Decentralized Storage Systems for Sustainable Data Self-Sovereignty: A Comparative Study

by
Mpyana Mwamba Merlec
1 and
Hoh Peter In
1,2,*
1
Department of Computer Science and Engineering, Korea University, Seoul 02841, Republic of Korea
2
DAO Solution Inc., Seoul 06247, Republic of Korea
*
Author to whom correspondence should be addressed.
Sustainability 2024, 16(17), 7671; https://doi.org/10.3390/su16177671
Submission received: 24 July 2024 / Revised: 28 August 2024 / Accepted: 31 August 2024 / Published: 4 September 2024

Abstract

:
In the digital age, data sovereignty has emerged as a critical concern due to the increasing demand for privacy, security, and user control. In this context, decentralized storage infrastructure is reshaping how data are stored and managed, leading the transition from traditional centralized models to a more decentralized and user-driven approach to data sovereignty, known as data self-sovereignty (DSS). This paper presents a systematic comparative analysis of decentralized storage systems, emphasizing their potential to enhance sustainable DSS. By highlighting the integral role of blockchain technology, this study critically examines various decentralized storage platforms, including Arweave, BitTorrent, Dat Protocol, Filecoin, Hypercore Protocol, IPFS, MaidSafe, Sia, Storj, and Swarm. The analysis covers the key architectural features of these systems, their performance metrics, and their contribution to user data sovereignty. This study aims to comprehensively explain how these decentralized storage solutions allow users to maintain complete control over their data, thus offering a viable alternative to traditional centralized storage methods. Therefore, This paper contributes to ongoing data sovereignty research and guides future developments in decentralized storage technologies.

1. Introduction

Data sovereignty has become a critical issue for many digital platforms due to increasing concerns over privacy, security, and user control [1,2,3,4,5,6,7,8]. Traditionally, data sovereignty has been founded on the principle that data are subject to the laws and governance structures of the country where they are collected or processed [2,3]. This concept relies on governmental control and data localization policies, where data must remain within specific geographic boundaries to comply with local regulations [2,3,4,5,6]. However, because digital interactions transcend national borders, the limitations of this traditional approach have become apparent, with individuals and organizations seeking greater autonomy and control over their data [3,4,5,6,7]. Similarly, traditional centralized storage systems, where third-party entities manage and control data, have received greater scrutiny due to their vulnerability to breaches, censorship, and unauthorized access [4,5,6,7,8,9,10]. These issues highlight the need for alternative approaches that allow users to retain control over their data, leading to data self-sovereignty (DSS) [11,12,13,14,15,16,17].
DSS extends data sovereignty by offering individuals and organizations complete control over their data, regardless of where they are stored or processed [12,13,14]. DSS emphasizes user autonomy, allowing individuals to decide how their data are stored, accessed, and shared without relying on centralized authorities or intermediaries [13,14,15]. This shift toward a user-driven data control model aligns with a broader movement toward decentralized digital infrastructure, where trust is distributed among participants rather than concentrated in a single entity [12,13,14,15,16,17,18].
Decentralized storage systems, many of which are based on blockchain technology, are an essential component of this transition toward DSS [12,13,14,15,16,17,18,19,20,21,22]. The inherent decentralization, transparency, immutability, and cryptographic security of blockchain ensures that data remain secure, tamper-proof, and under the user’s control [19,20,21,22,23]. An essential element of blockchain technology that supports DSS is using smart contracts, which are self-executing contracts running on the blockchain with the terms of the agreement directly written into the code [19,20,21,22,23]. These contracts automate and enforce the rules governing data access, usage, and sharing without intermediaries, ensuring that data remain controlled by their rightful owners [24,25,26]. Thus, decentralized storage systems offer an alternative to traditional centralized storage solutions by distributing data across multiple nodes in a network, thereby reducing reliance on a single point of failure and enhancing security, privacy, reliability, and user control [20,21,22,23,24,25,26].
However, a critical research gap has arisen due to the growing global concerns regarding security, privacy, and data control towards DSS, amplified by regulations such as the European Union’s General Data Protection Regulation (GDPR) [27,28]. According to [29], the volume of data created, collected, and consumed globally has increased dramatically since 2010, with forecasts up to 2025 predicting continued growth (Figure 1a) [29]. Big data analytics has emerged as an essential component of this data-driven ecosystem, with its market expected to experience substantial growth to reach billions of U.S. dollars (Figure 1b) [30]. This exponential growth in the volume of data and the big data analytics market highlights the urgent need for robust, scalable, and secure data storage solutions to accommodate this surge while protecting data sovereignty.

1.1. Contributions

This paper presents a systematic comparative analysis of decentralized storage systems, focusing on their potential to support sustainable DSS. In an era when data privacy, security, and ownership control are critical issues, decentralized storage solutions offer an alternative to conventional centralized data repositories. While many existing storage solutions claim to provide decentralized control [31,32,33,34,35,36,37,38,39,40,41], not all fully harness the potential of blockchain technology and smart contracts. This study critically examines blockchain-based and other decentralized storage models to evaluate their effectiveness in enabling DSS. The leading platforms analyzed are Arweave [32], BitTorrent [33], Dat Protocol [34], Filecoin [35], Hypercore Protocol [36], InterPlanetary File System (IPFS) [37], MaidSafe [38], Sia [39], Storj [40], and Swarm [41]. This study assesses how these platforms address the key challenges associated with implementing DSS, such as ensuring data immutability, security, accessibility, and the integration of blockchain technology while maintaining scalability and efficiency. This paper analyzes each platform’s key architectural features and performance metrics and highlights blockchain integration’s opportunities and challenges in balancing decentralization, security, and user control. This analysis can assist users and organizations in identifying the most suitable platform for their data management requirements.
This paper is motivated by the growing demand for sustainable DSS in blockchain-based decentralized storage systems. While many reviews have addressed data sovereignty and blockchain technology, our paper uniquely focuses on the intersection of blockchain technology and decentralized storage systems for achieving sustainable DSS. The aim is to enhance the understanding of how these solutions allow users to maintain complete control over their data while offering a viable alternative to traditional centralized storage methods, thus redefining self-sovereign data storage. By evaluating these platforms, this paper seeks to contribute to ongoing research on data sovereignty and inform future developments in decentralized storage technologies.
The present study also identifies current research gaps and the limitations of these systems, offering insights into the utility of decentralized storage technologies for building sustainable DSS. By providing a comprehensive overview of the strengths and weaknesses of each platform, this paper aims to inform developers, researchers, and policymakers of the potential of decentralized storage systems to redefine data management practices and promote a more secure and user-centric digital environment to achieve sustainable DSS.

1.2. Methodology

This study employed a systematic qualitative and quantitative analysis approach consisting of three primary stages:
  • Data Collection: A comprehensive review of the existing literature, technical documents, whitepapers, and user forums related to decentralized storage platforms was conducted. This diversity of data sources provided a robust foundation for understanding the current state of these systems and their real-world applications.
  • Comparison of Key Features: The key features of the selected decentralized storage platforms were systematically compared, focusing on the underlying technology, primary use cases, data redundancy, security, privacy, incentivization, payment models, data control, versioning, and community adoption. Particular emphasis was placed on assessing the support for DSS and integrating blockchain technology to enhance user control and data integrity.
  • Analysis of Performance Metrics: Performance metrics, including upload/download speed, latency, throughput, data redundancy, resource efficiency, and scalability, were analyzed. This analysis evaluated each platform’s operational efficiency and practical viability in supporting sustainable DSS.
The rest of this paper is organized as follows. Section 2 reviews data sovereignty, DSS, self-sovereign identity (SII), and decentralized storage architectures, setting the stage for the comparative analysis. Section 3 then compares selected decentralized storage platforms, focusing on critical architectural characteristics and performance metrics. Finally, Section 4 concludes this paper with a summary of essential insights and discusses directions for future research.

2. Research Background

This section presents the core concepts, principles, and technological mechanisms of decentralized storage systems. This is followed by a more detailed analysis of the evolution and significance of these platforms for use in data management and self-sovereignty.

2.1. Centralized, Decentralized, and Distributed Storage Systems

As illustrated in Figure 2, storage system architectures can be classified into three main categories: centralized, decentralized, and distributed [4,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,31,32,33,34,35,36,37,38,39,40,41,42,43]. The characteristics of these architectures determine their suitability for specific applications, including those involving DSS, depending on their control, resilience, and scalability requirements.

2.1.1. Centralized Architectures

In a centralized model, all nodes are connected to a single central node, creating a single point of failure; if the central server goes down, the entire system infrastructure can become inoperative [4,9,10]. This poses significant data integrity and privacy risks because attacks on the central server can compromise the system [9,10]. Furthermore, centralized systems often place control in the hands of a single entity, which can lead to concerns over data ownership and user autonomy [11,12]. While efficient for centralized control and resource management, this architecture is increasingly considered inadequate for modern privacy and data sovereignty requirements.

2.1.2. Decentralized Architectures

Decentralized architectures address some disadvantages of centralized systems by spreading responsibility across multiple authoritative nodes, improving reliability, and removing the risks associated with single points of failure [19,20,21,22,23,24,25]. In a decentralized system, each node may manage a specific region or a subset of services, communicating with other nodes to synchronize data and balance loads [20,21]. This model enhances resilience and autonomy because control is distributed among multiple nodes. However, decentralized systems can face significant challenges related to coordination and consistency as the network increases in size [22,23,24,25,26].

2.1.3. Distributed Architectures

Distributed architectures eliminate central nodes, with each node connecting to several others to share data and process tasks across a vast network [31,32,33,34,35,36,37,38,39,40,41,42]. This peer-to-peer (P2P) topology significantly enhances fault tolerance and load distribution, making the system robust against individual node failures [32,33,34,35,36,37,38,39,40,41]. Distributed systems are particularly well suited for applications requiring high levels of resilience and scalability because they can efficiently handle large volumes of data and traffic using the collective resources of the network [37,38,39,40,41,42]. However, managing distributed systems can be complicated, particularly in terms of ensuring data consistency and security across all nodes.
For DSS applications, distributed and decentralized systems have become increasingly favored because they give users greater control over their data while protecting against failures and attacks.

2.2. Data Sovereignty, Data Self-Sovereignty, and Self-Sovereign Identity

In digital data management, data sovereignty, DSS, and SSI define how personal and organizational data are controlled, stored, and accessed [5,6,7,8,9,10,11,12,13,14,15,16,17,18,44,45,46]. Each concept addresses data ownership and control elements, particularly within decentralized storage systems.

2.2.1. Data Sovereignty

Data sovereignty is based on the principle that data are subject to the laws and governance structures of the nation or territory where they are stored or processed [1,2,3,4,5,6,7]. Thus, its handling must comply with local regulations, such as data protection laws and privacy standards [2,3,4,5,6,7,8]. The GDPR is an example of such laws and regulations, giving the citizens of European Union member states greater control rights over their personal data [27,28]. In centralized storage systems, data sovereignty often concerns data localization, where data must reside within a specific geographic area to ensure they remain under local control [7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26]. However, this traditional view of data sovereignty has become less relevant in the digital age, where data are increasingly stored across borders, making it challenging to enforce jurisdictional control effectively.

2.2.2. Data Self-Sovereignty

DSS extends the concept of data sovereignty from state or organizational control to individual and organizational autonomy [2,12,13,14,15]. Unlike traditional data sovereignty, which is often tied to legal jurisdictions and state governance, DSS emphasizes the empowerment of users—whether individuals, organizations, or communities—by granting them complete control over how their data are collected, stored, accessed, processed, and shared [5,6,7,8,9,10,11,12,13,14,15,16,17]. This paradigm shift from centralized authority to user autonomy reflects the growing demand for privacy, security, and personal agency in the digital age.
In a DSS framework, users retain ownership of their data, making all decisions regarding its use without requiring external approval [7,8,9,10,11,12,13,14,15,16,17], which is particularly relevant in contexts where data privacy and control are essential, such as healthcare, finance, and personal identity management [8,9,10,11,12,13,14,15,16,17,18]. The authors in [12] highlight the potential of DSS in enabling self-sovereign storage managed by executable choreographies, where users can define and manage how their data are shared and used across different platforms. In addition, Liang et al. [13] explore the application of DSS in healthcare systems, advocating for decentralized accountability and self-sovereignty to protect patient data from unauthorized access and to ensure that individuals maintain control over their health information. In this context, decentralized storage systems are critical to realizing DSS. These systems, particularly those employing blockchain and distributed ledger technologies (DLTs), facilitate data distribution across a network of nodes, reducing reliance on a single point of control and enhancing user autonomy [14,15,16,17,18,19,20,21,22,23,24,25].
Kim [15] discusses the design of a self-sovereign data distribution platform to create a reliable data economy, highlighting the importance of establishing trust and reliability in systems that support DSS. In line with this, the study in [16] presents personal data stores (PDSs) as a potential solution for managing personal data in a way that aligns with the principles of DSS, emphasizing the need for interoperable systems that can be seamlessly integrated with existing technologies while maintaining user control and privacy. Abbas et al. [17] propose a broader framework for understanding data sovereignty beyond the traditional view, focusing on a social contract perspective. They emphasize that data sovereignty should focus on individual autonomy and consider communities’ and states’ collective rights and responsibilities in managing and governing data [17]. Additionally, they argue that data sovereignty is inherently tied to societal norms, legal frameworks, and collective agreements that define how data are used and protected within society [17].

2.2.3. Self-Sovereign Identity

SSI is closely related to DSS, focusing on digital identity management [44,45]. SSI empowers individuals to create, manage, and control their digital identities without relying on centralized entities such as governments or corporations [44]. In an SSI framework, identity data are stored on decentralized networks, often utilizing blockchain or DLTs to ensure security, privacy, and user control [44,45,46]. Unlike traditional identity systems, where centralized authorities handle identity verification and management, SSI enables individuals to manage their identities, controlling who has access to their personal information and under what conditions.
Figure 3 illustrates the typical architecture of an SSI system, which operates on a blockchain or DLT foundation [45,46]. In this model, the owner or holder of the identity has complete control over their credentials. An issuer, such as the government or an organization, issues these credentials to the owner and stores them on a verifiable data registry powered by blockchain or a DLT. This registry ensures that the credentials are secure, tamper-proof, and verifiable by any authorized party. When a verifier (such as a service provider) needs to verify the owner’s identity, the owner presents their credentials directly from the verifiable data registry, and the verifier confirms the validity of these credentials via the registry. This process ensures that the owner retains complete control over who can access their identity information and under what circumstances, reinforcing the principles of DSS.
Implementing SSI within decentralized storage systems represents a significant advancement step toward establishing DSS. Using blockchain or DLTs in SSI systems ensures that identity data are secure and tamper-proof [44,45]. This decentralization of identity management aligns with the principles of data sovereignty by preventing unauthorized access and ensuring that identity data are only shared with the user’s explicit consent [44,45,46]. Moreover, integrating smart contracts within an SSI framework further enhances the principles of DSS. Smart contracts ensure that users’ data and identities are managed according to predefined rules, which are transparently executed across the network, further reinforcing DSS and trust in the system.
In summary, while data sovereignty emphasizes control by the nation–state, DSS and SSI shift this control to the individual or organization. The use of blockchain and DLTs in decentralized storage systems and identity management frameworks plays a crucial role in realizing these concepts by providing the technological infrastructure necessary to ensure that data and identity credentials remain under the user’s control, secure from unauthorized access, and free from a centralized authority.

2.3. Blockchain, Consensus, and Smart Contracts

2.3.1. Blockchain Definition

Blockchain is a decentralized and distributed ledger technology that enables secure, transparent, and immutable data recording across a network of nodes [47,48]. It records a growing list of transactions in a chronological chain of blocks, timestamped and cryptographically secure [47,48,49]. Each block uses cryptographic algorithms to ensure the data remain tamper-proof and verifiable. A key feature of blockchain technology is its decentralized nature, which means that the data are synchronized across all nodes in the network [48,49]. Figure 4 presents the structure of a blockchain, with each block linked to the previous one through cryptographic hashes. This chaining of blocks ensures that altering one block would require changes to all subsequent blocks, making unauthorized modifications virtually impossible. The primary data fields within each block include the block number, previous and current block hashes, transaction data, and timestamps.
The chain’s foundation is the first block in any blockchain, known as the genesis block or block number 0. Unlike subsequent blocks, the genesis block does not reference any previous block and typically contains minimal or no transaction data. Once a transaction is recorded in a block, it cannot be altered without retroactively modifying all subsequent blocks, a process that requires the consensus of the majority of nodes in the network [48,49]. This inherent immutability is a cornerstone of blockchain technology, providing high security and trust.

2.3.2. Blockchain Classification

Blockchain networks are generally classified as permissionless and permissioned [47,48,49,50,51,52,53,54,55,56,57]. A permissionless blockchain allows anyone with the necessary software and configurations to access and participate in the network [47,48]. In contrast, a permissioned blockchain requires prior authorization, restricting participation to selected entities [49,50,51]. Based on governance models, these classifications can be further divided into three main types of blockchain networks: public, private, and consortium [47,48,49,50,51,52,53,54,55,56,57,58,59] (Table 1).
  • Public blockchains are permissionless networks where participants can securely engage in transactions without intermediaries [47,48,49]. These networks operate in a trustless environment, meaning transactions are validated and recorded without requiring trust between the parties involved. Notable examples include Bitcoin [47] and Ethereum [48], widely recognized for their open and decentralized nature. In these public blockchains, participants can join the network anonymously, and any node with the requisite software can create, verify, and validate transactions. The key advantages of public blockchains include independence, decentralization, transparency, and trust. However, due to the computational demands of consensus mechanisms such as PoW, these networks often face problems related to performance, scalability, and high energy and resource consumption [47,48,49].
  • Private blockchains are closed, permissioned networks managed by a single organization or a select group of entities [50,51,52,53,54]. Access is restricted to authorized participants, allowing greater control over the network and its operations. Hyperledger Fabric [50,51], R3 Corda [51,52], and Ripple [53] are prominent examples of private blockchains. While private blockchains offer improved performance, scalability, and privacy compared to public blockchains, they sacrifice some of the core principles of decentralization, although some private blockchains may be controlled by a small group rather than a single entity [50,51,53]. The centralized control in private blockchains may limit transparency and trust because the managing organization strongly influences the network’s operations.
  • Consortium blockchains represent a hybrid model that combines elements of both public and private blockchains [50,51,52,53,54,55,56]. These networks are managed by a group of organizations that share a common ledger, with governance distributed among the consortium members [50,51,54]. Consortium blockchains balance the openness of public blockchains and the control of private blockchains, offering enhanced security and decentralization compared to private networks. However, establishing and maintaining a consortium blockchain can be complex, requiring coordination and collaboration among multiple entities. Examples of consortium blockchain platforms include Hyperledger Fabric [50,51], R3 Corda [52,53], Hyperledger Sawtooth [55], and Quorum [56]. These networks are particularly well-suited for industries where multiple organizations must collaborate while maintaining control over their data and transactions.
The choice of blockchain type strongly impacts the design and functionality of decentralized storage systems. Blockchain’s inherent decentralization and reliance on consensus mechanisms ensure data integrity and security by enabling multiple network nodes to agree on the validity of transactions [47,48,49,50,51,52,53,54,55,56]. This decentralized control is fundamental to enhancing data sovereignty because it mitigates the risks associated with centralized data management and reduces the likelihood of data breaches or unauthorized access [12,13,14,15,16,17,18]. Thus, understanding the nuances of different blockchain types and their respective advantages and drawbacks is crucial for developers and organizations aiming to implement decentralized storage solutions that align with their specific needs for DSS, security, and scalability.

2.3.3. Consensus Protocols

Consensus protocols are the mechanisms by which blockchain networks maintain immutability and ensure that all nodes agree on the current state of the ledger [57,58,59]. These protocols are required to maintain the network’s decentralized nature by ensuring that no single entity controls the data.
One of the most widely known consensus protocols is Proof-of-Work (PoW), which is utilized by public networks such as Bitcoin and Ethereum [47,48]. In the PoW protocol, network participants known as miners compete to solve complex mathematical problems. The first miner to solve the mathematical puzzle earns the right to add a new block of transactions to the blockchain. While PoW is highly secure due to its resistance to tampering, it has also been criticized for its high energy consumption and the computational resources required [47,48,49].
Proof-of-Stake (PoS) consensus mechanism selects validators based on the number of cryptocurrency tokens they hold and are willing to stake as collateral [57,58]. This method is more energy-efficient than PoW and offers security by penalizing malicious validators. Delegated Proof-of-Stake (DPoS) is a variation in which token holders vote for a small group of nodes to act as validators on their behalf [58]. This approach introduces an element of governance and decentralization into the network because validators are selected through a democratic process and can be replaced if they fail to act in the network’s best interests.
Another important consensus mechanism is Practical Byzantine Fault Tolerance (PBFT), commonly used in permissioned blockchain networks where participants are pre-approved and trusted [50,57]. PBFT requires a two-thirds majority agreement among nodes to validate transactions, ensuring a fast and efficient consensus while maintaining security. Although PBFT provides quick transactions and is less resource-intensive, it assumes a higher level of trust among participants, which might not be suitable for all blockchain networks [57,58,59].
Protocol choice often depends on specific factors such as network scalability, security requirements, and environmental impact. By understanding the advantages and disadvantages of these approaches, developers can design and implement blockchain networks tailored to their applications’ needs while maintaining robustness, reliability, and security.

2.3.4. Smart Contracts

Smart contracts are self-executing programs that run on a blockchain, automatically enforcing the terms of an agreement when predefined conditions are met [60,61]. These contracts operate in a decentralized environment, eliminating the need for intermediaries and enabling trustless, transparent transactions. The automation provided by smart contracts ensures that all parties involved adhere to the agreement terms without manual enforcement, thus enhancing both efficiency and security. A prominent example of a blockchain platform that supports smart contracts is Ethereum [48], which offers a Turing-complete programming language that allows developers to create complex and customizable contracts. These contracts cover various agreements, from financial transactions and asset transfers to deploying decentralized applications (dApps) and automated business processes [26,43,60,61,62,63].
Figure 5 illustrates the architecture of a blockchain system with an integrated smart contract execution environment. The blockchain ledger starts with a genesis block and continues with subsequent blocks, each containing a set of transactions. The system includes an execution environment in which smart contracts are written in code and executed when specific conditions are fulfilled. These contracts interact with a state database (StateDB), which records changes in real-time. Miners or validators in the network play a crucial role in this system by validating and adding new blocks to the blockchain through a consensus process. This ensures that once a block is added to the blockchain, it becomes a permanent part of the ledger, securing the encoded terms within the smart contracts across the network. This architecture provides a decentralized, transparent, and tamper-proof framework for executing and enforcing agreements.
In blockchain-based decentralized storage systems, smart contracts are critical in governing data access, storage, and transaction management [15,16,17,18,19,20,21,22,23,24]. These contracts ensure that data storage and retrieval processes adhere to the predefined conditions set by the data owner, thus offering users complete control over their data, which is a fundamental element of DSS [17,18,19,20,21,22,23,24,25,26]. Through smart contracts, users can define who has access to their data, under what conditions, and for how long without relying on centralized entities to enforce these rules.

2.4. Decentralized Storage Systems

Decentralized storage systems differ from traditional centralized storage models in distributing data across a network of peer nodes rather than relying on a single centralized server [19,20,21,22,23]. Each node contributes storage capacity and computational resources in these systems, collectively creating a robust and distributed storage infrastructure. The core principle of decentralized storage is eliminating single points of failure and enhancing data resilience and availability. By dispersing data across multiple nodes, decentralized storage systems ensure high data redundancy, maintaining data availability even if some nodes fail or go offline [31,32,33,34,35,36,37,38,39,40,41].
An essential component of decentralized storage systems is their integration with blockchain or DLTs to ensure the integrity, security, and trustworthiness of the data stored within these networks [19,20,21,22,23,24,25,32,33,34,35,36,37,38,39,40,41]. In these systems, data are distributed across multiple nodes and encrypted to protect user privacy. Blockchain strengthens these features by providing an immutable record of all data transactions, ensuring that data remain tamper-proof and trustworthy [20,31,35]. This integration enhances data security by making the storage process transparent and verifiable while protecting against unauthorized access and tampering. Each transaction or data entry is cryptographically secured and validated by a consensus mechanism, ensuring that most of the network agrees upon all changes to the data [19,20,21,22,23,24,25].
With its immutability, transparency, and decentralization characteristics, blockchain supports DSS by ensuring that data ownership and control remain with the user. For example, Yan et al. [14] propose a blockchain-based privacy-preserving data storage (BC-PDS) system that protects privacy and upholds self-sovereignty, allowing users to control their data even when they are shared across different entities. In blockchain-based decentralized storage systems, trust is not placed in a central authority but is instead distributed across the network, where the consensus of multiple independent nodes maintains it. This trustless environment is critical to the system’s ability to protect data from breaches and unauthorized access, making it particularly relevant for DSS.

2.4.1. Decentralized Storage Architecture

Decentralized storage systems typically operate on a P2P network architecture, where participants can exchange unused storage space for incentives, such as tokens or cryptocurrency [32,33,34,35,36,37,38,39,40,41]. Blockchain technology facilitates this incentivized model by enabling the creation and management of digital tokens that reward participants for their contributions to the network [32,35,39]. This incentivization promotes active participation and ensures the storage ecosystem’s sustainability and scalability [35,40]. By aligning the interests of participants with the overall health and efficiency of the network, decentralized storage systems promote continuous availability, reliability, and security of the stored data.
Figure 6 presents an overview of a typical blockchain-based decentralized storage system backed by a P2P network. The process of storing data in this system involves four key steps:
Data Uploading: The user uploads a plaintext data file to the decentralized storage system.
Data Encryption: Once the data file is uploaded, it is encrypted using secure cryptographic algorithms. This transformation of the plaintext data into ciphertext ensures security, privacy, and confidentiality. The encryption process protects the data from unauthorized access, ensuring that only those with the correct decryption key can access the original content.
Data Fragmentation (Sharding or Partitioning): After encryption, the data file is split into smaller fragments known as shards or chunks. This partitioning process enhances the storage system’s scalability, flexibility, and performance. The system improves data security and efficiency by distributing these smaller encrypted/hashed fragments across the network with their hashes stored on the blockchain. Sharding enhances data retrieval speed and overall network performance because each fragment can be stored and accessed independently.
Data Chunk Distribution: The final step involves distributing the encrypted fragments across multiple nodes within the P2P network. Each node in the network is a computer or device that contributes storage space and participates in data storage and retrieval operations. Storing hashed data fragments across various nodes ensures redundancy and high availability, which means that even if some nodes go offline or fail, the data remain accessible and intact due to its presence on multiple other nodes in the network.

2.4.2. Decentralized Storage Characteristics

Decentralized storage systems exhibit several key distinguishing features from traditional, centralized storage solutions.
  • Decentralization: Unlike traditional storage solutions that store data on centralized servers controlled by a single entity, decentralized storage systems distribute data across a network of nodes governed by consensus protocols [31,32,33,34,35,36,37,38,39,40,41]. This distribution is enabled by blockchain or other DLTs, which ensure that no single point of control exists within the system [47,48,49,50,51,52,53,54]. This decentralization enhances system resilience and reduces the risk of data tampering or loss due to central server failures.
  • User Control and Censorship Resistance: One of the fundamental principles of decentralized storage systems is that users have full ownership and control over their data. These systems inherently resist censorship because no central authority can enforce data removal [32,33,34,35,36,37,38,39,40,41]. Users retain the ability to determine how their data are shared and used, ensuring that they maintain sovereignty over their digital assets. User control is essential in environments where freedom of information and privacy are vital.
  • Enhanced Security and Privacy: Decentralized storage systems significantly enhance security and privacy by spreading data across multiple nodes in the network [34,35,36,37,38,39,40]. This dispersion makes it more difficult for attackers to compromise the entire system because gaining control of a single node does not provide access to the complete dataset. Additionally, these systems employ advanced encryption and cryptographic techniques to protect data from unauthorized access [32,33,34,35,36,37,38,39,40,41]. This ensures the data remain secure and unreadable even if a node is compromised without the proper decryption keys.
  • Redundancy and Reliability: One of the key advantages of decentralized storage systems is their inherent redundancy. Data are replicated across various nodes within the network, ensuring that they remain accessible even if some nodes fail or go offline [35,36]. This redundancy enhances the system’s overall reliability, giving users confidence that their data will be available when needed.
  • Data Portability: Decentralized storage systems offer greater data portability, allowing users to move their data easily from one service provider to another [10,18,20]. This is possible because no single provider has a lock-in over the user’s data. Users can migrate their data without the restrictions often imposed by centralized storage providers, thus avoiding vendor lock-in and enhancing user autonomy.
  • Interoperability and Scalability: Decentralized storage systems are designed to be highly interoperable and scalable [32,33,34,35,36,37,38,39,40]. They can be easily integrated with different systems and services, accommodating various use cases and applications [18,19,20]. As the network of nodes grows, the system’s storage capacity and processing power can scale accordingly, allowing it to handle increasing amounts of data without compromising performance. This scalability supports large-scale applications and ensures the system remains efficient as demand grows [32,33,35].
These characteristics align closely with the principles of DSS, where users retain control over their data, ensuring security, privacy, and freedom from censorship. Section 3 explores how commercially available decentralized storage solutions implement these features and contribute to sustainable DSS.

3. Existing Decentralized Storage Solutions

Several decentralized storage solutions [19,20,21,22,23,24,25,31,32,33,34,35,36,37,38,39,40,41] have been recently developed, emphasizing DSS and providing alternatives to traditional centralized data storage approaches. We compare the key features and performance metrics of Arweave [32], BitTorrent [33], Dat [34], Filecoin [35], Hypercore Protocol [36], IPFS [37], MaidSafe [38], Sia [39], Storj [40], and Swarm [41] decentralized storage platforms (Table 2).

3.1. Key Feature Comparison

Our comparison of the decentralized storage systems focuses on the underlying technology, primary use cases, security features, privacy, blockchain utilization, incentivization and payment models, data control, versioning support, and community adoption. The qualitative criteria listed in Table 2 are based on assessments of characteristics, features, and general observations rather than precise measurements.
The underlying technology criterion describes the core technology each platform is built upon, such as blockchain, DLT, or P2P networks, which influences performance, scalability, and adherence to decentralized principles [64,65,66,67,68,69]. Use case identifies the primary purpose of the platform (e.g., file sharing, permanent data storage, secure data management, or real-time data sharing) so that users can select a platform that best suits their needs [18,19,20,21,22,23,24,25,26]. Security features are evaluated based on the mechanisms employed to protect the data, with ratings reflecting the comprehensiveness of encryption, redundancy, and secure access protocols [9,20,21,22,23,24,25,26,63,64,65,66,67,68,69]. Privacy assesses the level of privacy protections, with a low rating indicating limited controls and potential data exposure, moderate signifying basic protections with some vulnerabilities, and high denoting robust privacy features that restrict access to authorized users only. Blockchain utilization measures how extensively blockchain technology is integrated into the platform’s operations [19,20,21,22,23,24,25,26]. A low rating suggests minimal or no blockchain use, moderate indicates partial integration for specific functions, and high indicates that blockchain is central to the platform’s core functionalities. Data control evaluates the degree of user control over data, with low indicating centralized control by a single entity, moderate reflecting some user control but with third-party dependencies, and high signifying complete user control, which is typical of a decentralized system [17,18,19,20,21,22,23,24,25,26]. Versioning support is a binary criterion indicating whether the platform supports data versioning. Yes means the platform allows users to access and manage different versions of their data, while no means it does not retain previous versions of data. Finally, community adoption assesses how widely the platform has been adopted, ranging from niche, where the platform is used by a small, specialized group, to very wide, indicating extensive adoption across various sectors and geographies. Emerging indicates that the platform is gaining interest but is still early in its adoption. Growing means it is seeing significant and rapid adoption, and established denotes that the platform is widely recognized with a stable and substantial user base.
  • Arweave [32] is a decentralized platform that provides permanent data storage, focusing primarily on privacy, censorship resistance, and long-term data durability. It employs a blockchain-like structure called Blockweave, which ensures that data are retained indefinitely once stored [31,32]. This permanence is achieved through high redundancy, where copies of the data are distributed across the network and stored forever. The platform employs advanced encryption techniques to ensure high security and privacy. Transactions within the Arweave network are facilitated by its native cryptocurrency AR tokens [32], which users use to pay for storage services. Despite its strong focus on data permanence and security, Arweave does not support file versioning, which might be a limitation for users requiring historical data tracking [31,32]. Arweave has been particularly appealing for use in projects that must store critical data permanently, such as archives, academic records, and web content requiring tamper-proof preservation [25,32,34,54].
  • BitTorrent [33], one of the most well-known P2P file-sharing protocols, is known for efficiently distributing large amounts of data over the internet. Due to its robust scalability and widespread adoption, it is especially effective for sharing popular files that many users want to access simultaneously. Despite its basic security, which largely depends on the file source, BitTorrent remains widely used due to its ease of use and scalability [31,33]. The platform does not integrate blockchain technology and lacks file versioning functionality. BitTorrent has evolved into a suite of commercial products, including the BitTorrent File System (BTFS) [64], a scalable decentralized storage system that aims to reduce storage costs, improve fault tolerance, and avoid censorship. BTFS leverages the large user base of the BitTorrent protocol to create a distributed file storage network that supports dApps [64]. However, BitTorrent and BTFS do not integrate blockchain technology directly and lack file versioning capabilities, which may limit their use cases in scenarios that require blockchain’s trustless features.
  • Dat [34] is an open-source, decentralized data-sharing protocol specifically tailored to the needs of the scientific and academic communities. It enables a secure, versioned, and distributed data storage network that allows researchers to share datasets using a P2P protocol [31,34]. Dat is designed to foster collaboration among researchers via the efficient sharing of large datasets with built-in version control, which allows users to access previous versions of the data [31,34]. This makes it an invaluable tool for collaborative scientific research, which relies on the integrity and reproducibility of data. The focus on privacy and user control is essential for its niche community, which values the ability to maintain autonomy over shared data without relying on centralized authorities [31,34]. Dat is backed by a community of developers and researchers dedicated to improving the transparency and accessibility of scientific data.
  • Filecoin [35] is a decentralized storage network that transforms cloud storage into an algorithmic market. Built on IPFS [37], Filecoin allows users to rent out their unused storage space in a decentralized storage marketplace; users can choose from various storage providers based on price, redundancy, and geographical location. The platform operates on a blockchain foundation, ensuring transaction transparency and security [31,35,65]. Filecoin uses strong encryption and file contracts to guarantee data integrity and privacy, making it a highly scalable solution for decentralized storage needs. Although Filecoin offers robust privacy protections, it does not support file versioning, which is a limitation for users who require access to historical versions of their data [31,35]. The Filecoin network is powered by its native cryptocurrency, also named Filecoin [35], which incentivizes users to provide and maintain storage capacity on the network. Filecoin is widely regarded as one of the most promising decentralized storage solutions, hosting a growing ecosystem of applications and services.
  • Hypercore Protocol [36] is an open-source P2P protocol for fast, scalable, secure, and real-time data sharing. It is built on Hypercore logs [36], signed append-only logs similar to lightweight blockchains without a consensus algorithm. Hypercore Protocol is optimized for efficiently sharing large datasets and streaming data, which makes it suitable for various applications, such as collaborative environments and real-time data streams [31,36]. The protocol provides high redundancy and scalability, ensuring data remain accessible and consistent across the network. Hypercore also supports file versioning, allowing users to track changes and access previous versions of their data [31,36]. The platform is backed by an emerging community of developers and users exploring its potential use in various applications.
  • IPFS [37] is a decentralized and P2P file-sharing network designed for securely storing and sharing data in a distributed file system. IPFS uses a content-addressing scheme that uniquely identifies each file based on its content rather than its location, creating a more efficient and resilient internet [31,37]. The system employs a distributed hash table (DHT) and Merkle-directed acyclic graph data structure to ensure data integrity and availability across the network [37]. IPFS is particularly well suited for applications that require decentralized web hosting, distributed data sharing, and content distribution [31,37]. The platform balances privacy with user control, offering high scalability and file versioning support. IPFS has been widely adopted and has a large and active community of users and developers, expanding its capabilities. Other decentralized storage solutions, such as Filecoin [35,65], use IPFS as their underlying technology, demonstrating its versatility and importance.
  • MaidSafe [38] is a decentralized file system operating on the Secure Access For Everyone (SAFE) network [31,70,71] without centralized control or indexing. The SAFE network is designed for secure and decentralized data management services with support for file versioning, thus ensuring data privacy, security, and user autonomy [38,70,71]. It employs a distributed locking mechanism to ensure data integrity, even when multiple users access the same file simultaneously [70,71]. While its scalability is moderate, MaidSafe benefits from an established community of supporters and contributors. Safecoin [31,72] is the native cryptocurrency of the SAFE Network, which is used to incentivize users to contribute resources. Safecoin enables secure, private transactions and facilitates micropayments within the network without intermediaries [31,72]. MaidSafe’s focus on privacy and security makes it a popular choice for users who prioritize these features in their data storage solutions.
  • Sia [39] is a blockchain-based decentralized cloud storage platform that splits, encrypts, and distributes files across a decentralized network. This platform ensures high redundancy and security, allowing users to maintain control over their data [39,65]. The storage model of Sia enables users to rent out unused storage space to others, creating a decentralized storage marketplace. The platform uses strong encryption to protect data and offers high redundancy by storing multiple copies of each file across the network [39]. Sia’s scalability is moderate, and its storage and retrieval services are powered by its native cryptocurrency, Siacoin [39,65]. Siacoin facilitates transactions within the network, incentivizing storage providers and ensuring the continued availability of storage resources. Sia is particularly well suited for users who require a secure and decentralized alternative to traditional cloud storage services.
  • Storj [40] is a decentralized cloud storage platform that leverages blockchain technology to provide secure, efficient, and cost-effective data storage services. All data stored on Storj are encrypted, fragmented into small pieces, and distributed across a global cloud network [31,40,65]. The platform ensures enhanced privacy and security through encryption and sharding, restricting data access to owners or authorized users. The platform supports file versioning, allowing users to track changes and access previous versions of their data [40]. Storj has gained widespread community adoption due to its high scalability, ease of use, and support for various applications, including web hosting, data archiving, and content distribution [40,65,73]. The platform uses STORJ tokens as its payment model [73] to incentivize storage providers and facilitate transactions within the network.
  • Swarm [41] is a distributed storage platform and content distribution service that is part of the Ethereum Web3 Stack [48]. It provides decentralized public record storage with high redundancy and security, ensuring that data remain accessible and tamper-proof [31,41]. Swarm’s integration with the Ethereum network allows it to use smart contracts and other blockchain-based features to enhance data management and distribution [41,48]. Despite its moderate scalability, Swarm has a growing community behind it. The platform uses a token-based model (BZZ) to incentivize participants and support its operations, making it a vital component of the Ethereum ecosystem’s decentralized infrastructure [41,48].
Each platform, while contributing to the broader goal of decentralized storage, offers features and operational paradigms that cater to the various requirements of self-sovereign data management.

3.2. Comparison of Performance Metrics

Table 3 presents a summary of quantitative (upload/download speed, latency, throughput, data redundancy, consistency, scalability, data reliability, resource efficiency, and cost efficiency) and qualitative performance metrics (security, complexity, ease of integration, and user adaptation) for the selected decentralized storage systems.

3.2.1. Speed and Latency

Speed refers to the data transfer rate at which data can be uploaded to or downloaded from the system [55,56]. A high rating indicates fast data transfer rates, typically measured in megabits per second (Mbps) or gigabits per second (Gbps). A moderate rating indicates average data transfer rates suitable for most applications, while low indicates slower rates that might cause delays. Latency is the delay experienced in the system [55,56]. A low latency implies minimal delay, often measured in milliseconds (ms), which is crucial for real-time applications. A moderate latency suggests an acceptable delay that may not significantly affect most use cases, whereas high latency indicates a more significant delay that could impact performance in applications requiring immediate data access.
BitTorrent [33] excels with its high speed and low latency, particularly for widely shared files, offering rapid transfer rates and swift responses, which makes it a popular choice for content distribution. Arweave [32], Dat [34], IPFS [37], MaidSafe [38], and Swarm [41] exhibit moderate speeds, which, while not groundbreaking, serve the needs of applications where the speed of data retrieval is secondary to other factors such as data permanence or integrity. Their latency is also moderate, indicative of a consistent but not instantaneous data access model. This level of performance is particularly suited to use cases that can tolerate some delays, such as archival storage or applications where data are accessed infrequently. On the other hand, Filecoin [35], Hypercore [36], and Storj [40] offer both high speeds and low latency. These characteristics reveal that these platforms are optimized for quick access and employ advanced networking and data distribution techniques. These attributes make these platforms particularly adept at handling dynamic content delivery, which requires fast response times and data transfer. Sia [39,65], while achieving high speeds, has moderate latency. This could be due to the mechanisms in place for data splitting and encryption, which increase security but may introduce slight access delays [39].
These platforms are selected based on specific end-user requirements. For cases where speed is important, such as streaming or real-time collaboration, platforms such as Filecoin [35], Hypercore [36], and Storj [40] are advantageous. Conversely, for users whose priority is secure, long-term storage, the moderate speeds of Arweave [32], Dat [34], and MaidSafe [38] are sufficient. The performance profile of these platforms is thus tailored to particular niches within the broader decentralized data storage ecosystem.

3.2.2. Throughput and Scalability

Throughput measures the amount of data a system can process within a given time, with high throughput indicating the system’s efficiency in handling large volumes of data, typically measured in Mbps or Gbps [55,56]. A moderate rating reflects an average data processing capacity sufficient for standard workloads, while platforms with low throughput may experience bottlenecks under heavy usage. Arweave [32], BitTorrent [33], Filecoin [35], Hypercore [36], IPFS [37], Sia [39], and Storj [40] are noted for their high throughput, meaning that they are well equipped to handle the exchange of large volumes of data. This makes them suitable for applications with heavy data demands, such as video streaming services, large-scale backup solutions, and active content delivery networks.
Scalability is crucial for platforms that need to anticipate a growth in user demands and data volumes [74,75]. Scalability evaluates the system’s ability to grow and handle increased data or user loads without performance degradation [74,75]. A high rating indicates the system scales efficiently, managing significant increases in data volume or user numbers with minimal impact. Moderate scalability indicates that the system can handle some load increases but may require optimization or additional resources, while low scalability indicates a limited capacity to scale, with performance likely to degrade under increased load.
Due to its P2P structure, BitTorrent [33] readily allows for expansion as more users join the network and share resources. Filecoin [35] and Storj [40] also exhibit high scalability, which can be attributed to their blockchain foundation, allowing them to employ a distributed network of nodes for data storage. Their scalability ensures that the increase in data or users does not detrimentally impact performance, an attractive feature for enterprise-level applications. Arweave [32], with its high throughput, also shows high scalability, meaning it can support increasing levels of permanent data storage without losing access speed. This makes it particularly appealing for archival purposes, where the volume of data is expected to increase over time. IPFS [40], while offering high throughput, maintains moderate scalability. This suggests that, while it can efficiently handle high volumes of data, there are limitations to how much the system can grow before encountering potential bottlenecks or performance degradation. With moderate throughput and scalability, MaidSafe [38] and Swarm [41] offer a balanced approach suitable for a range of dApps, though possibly not those requiring intense or rapid scaling.
Overall, platforms such as BitTorrent [33], Filecoin [35], and Storj [40] stand out for both their high throughput and scalability, making them robust choices for users with intensive data transfer needs and growth expectations. In contrast, platforms such as MaidSafe [38] and Swarm [41], with their moderate ratings, might be suitable for users with stable and predictable scalability requirements.

3.2.3. Data Redundancy and Availability

Redundancy assesses the degree to which data are replicated across a system to prevent loss in the case of node failure [76,77]. A very high redundancy rating means data are replicated across multiple nodes and locations, providing strong protection against data loss. A high redundancy indicates adequate replication, ensuring good protection, while moderate suggests some replication, but it may not be entirely reliable in the event of multiple failures. Availability measures the system’s uptime and accessibility, ensuring data are consistently available to users [78,79].
Arweave’s architecture offers very high redundancy to support its promise of permanent data storage [32]. The data stored on Arweave are replicated across multiple nodes, ensuring that even in the case of node failures, the data remain intact and perpetually accessible. This level of redundancy is essential for use cases that require archival storage where data must remain unaltered and retrievable indefinitely. BitTorrent [33], Dat [34], Filecoin [35], Hypercore [36], IPFS [37], Sia [39], Storj [40], and Swarm [41] all demonstrate high redundancy. The high redundancy ensures that data are reliably available when users request it. For example, BitTorrent’s performance is founded on widespread file distribution, particularly for popular files, which inherently create multiple copies across different nodes, enhancing data availability [33,64]. In addition, Sia [39] and Storj [40,65], which operate on blockchain technology, distribute and encrypt data across numerous hosts, ensuring that no single entity holds the entire dataset. This improves data security and ensures that the system can quickly recover and provide data even if some network parts are down. Swarm [41] is also designed to store Ethereum’s public records and user data across a distributed network, ensuring that data are widely available and resistant to censorship and outages.
The level of redundancy and data availability are critical considerations when choosing a decentralized storage solution. Platforms such as Arweave [32] stand out for use cases that demand data permanence and historical preservation, while others such as BitTorrent [33], Sia [39], and Storj [40] are well suited for applications where high availability and data resilience are important.

3.2.4. Resource Efficiency and Network Dependence

Resource efficiency determines how well a system uses its storage capacity, bandwidth, and computational power [76,80,81]. Efficient resource use minimizes costs and ensures the system’s sustainability, particularly in decentralized networks where resources are provided by individual network participants [80,81]. Network dependence refers to the reliance of a storage system on the stability and performance of its network infrastructure [65,66,67,68,69,82].
BitTorrent [33,64] demonstrates high resource efficiency when distributing popular files because the load is shared among numerous peers, which reduces bandwidth and storage requirements for individual nodes. However, this efficiency can decrease with less popular content due to a lack of sharing nodes, increasing network dependence on file availability. Arweave’s unique blockweave structure has the potential to lead to a higher demand for resources due to its permanent data storage [32]. Nevertheless, its design ensures that once data are stored, they remain accessible without continuous replication, reducing long-term network dependence. Filecoin [25], Sia [39], and Storj [40], which employ blockchain technologies, distribute data across multiple nodes. Their resource efficiency is maintained through algorithms that dynamically allocate storage and bandwidth, ensuring no single node is overburdened. However, they exhibit a certain degree of network dependence for data integrity and availability because their performance and reliability depend on the network’s overall health. IPFS [37] and Swarm [41] prioritize resource efficiency by de-duplicating data and ensuring content is distributed close to requesters, which minimizes latency and bandwidth usage [80,82]. However, their network dependence is moderate because performance partly relies on the number of nodes hosting the content and the network topology. Dat [34] and Hypercore [36] offer efficient data streaming and versioning, utilizing resources effectively for real-time data sharing and updates. Their network dependence is moderate, balancing distributed data ownership and peer availability for optimal performance. MaidSafe [38], with its SAFE network [70,71,72], focuses on autonomous network operation, aiming to reduce resource waste and reliance on centralized infrastructure. This approach creates a self-healing network that remains efficient and less dependent on the contribution of any single node.
Thus, decentralized storage platforms utilizing blockchain technology optimize resource efficiency through sophisticated algorithms, though they retain some network dependence for overall system robustness. In contrast, P2P file-sharing systems such as BitTorrent [33,64] demonstrate high resource efficiency in proportion to user participation, with performance that scales with the popularity of the content. Users must consider the balance between resource efficiency and network dependence in the context of their specific operational demands and choose a platform that aligns with their performance and sustainability goals.

3.2.5. Consistency, Reliability, and Security

All platforms in this comparison provide reliable data storage solutions, but their consistency, reliability, and security approach vary. Consistency measures how reliably the system delivers consistent data and performance outcomes across different scenarios [66,69,80,82]. A high rating indicates the system provides uniform performance and data delivery under various conditions, moderate means occasional variability, and low indicates frequent inconsistencies in data delivery or system performance. Data reliability refers to the system’s consistency in storing and retrieving data accurately over time [66,78,79]. A very high rating implies the system consistently delivers accurate data storage and retrieval with minimal errors, while high reliability indicates general dependability with occasional minor issues. Moderate reliability suggests average performance, with potential risks of data retrieval errors under certain conditions.
Filecoin [35], Sia [39], and Storj [40] maintain consistency based on the inherent design of blockchain, which ensures that every transaction or data operation is verified and agreed upon by multiple nodes before being executed. This provides consistency in operations and adds to the system’s overall reliability because the blockchain ledger provides a transparent and verifiable record of all transactions. Arweave’s data storage is immutable, meaning data cannot be changed once stored. It gives a very high level of consistency and reliability, particularly for applications where data must remain unchanged over time, such as digital archives or legal documents [32]. BitTorrent’s consistency can vary because it relies on the availability of file fragments across a P2P network [33,64]. Its reliability is high when many users share a file but can decrease when the number of peers is low. However, it is highly reliable for popular files due to the sheer number of copies available across the network.
Dat [34] and Hypercore [36] focus on version control, a form of consistency, by maintaining a history of file changes, allowing users to access and revert to previous versions as needed. This feature is precious for collaborative projects, where tracking changes and maintaining consistency over time is critical. IPFS uses content addressing to ensure that the data retrieved are consistent with what was stored because each file is accessed through a unique hash [37]. This contributes to consistency and reliability because the hash will always retrieve the same content. MaidSafe’s SAFE network [38,70] provides consistency and reliability by autonomously managing data in a decentralized network, where data are continuously moved and replicated to ensure they remain accessible even if parts of the network go offline.
Security evaluates the robustness of security measures implemented to protect data within a system [8,9,61]. A high rating indicates advanced security features, such as solid encryption and access controls, ensuring robust data protection. Moderate security suggests standard features that provide reasonable protection but may have vulnerabilities, while low security indicates the use of basic measures that could expose data to higher risks of breaches. In terms of security, all platforms implement robust measures to protect data. Filecoin [35], Sia [39], and Storj [40] use encryption and distribute data across multiple nodes, so no single node holds a complete copy, which significantly enhances data security. IPFS, MaidSafe, and Swarm also use encryption to secure data, ensuring only authorized users can access the content [37,38,41,71]. Arweave [32] and BitTorrent [33,64] differ slightly in their security models. Arweave’s permanent storage model means security must be stringent from the outset because data cannot be altered once stored. BitTorrent’s security is variable and depends on the peers’ trustworthiness in file sharing.
Because these platforms all prioritize consistency, reliability, and security, the choice between them will depend on the specific needs of users. Arweave or blockchain-based solutions will be preferred if immutability and historical accuracy are essential. For dynamic and collaborative environments with frequent changes, systems with solid version control mechanisms such as Dat or Hypercore could be more suitable. Users concerned with security will find any platform a robust option, but they may lean towards those offering additional privacy and data protection layers, such as Filecoin or MaidSafe.

3.2.6. Cost Efficiency, Complexity, and Ease of Integration

Cost efficiency refers to the balance between performance and the data storage platform’s cost [66,80,81]. A high rating indicates strong performance relative to cost, representing good value for money. Moderate cost efficiency reflects reasonable costs given the performance, though not necessarily the most economical, while variable indicates fluctuating costs depending on usage patterns, making overall expenses harder to predict. For cost efficiency, BitTorrent [33,64] emerges as a standout choice due to its free, ad-supported business model, making it highly accessible for general file-sharing purposes. In contrast, platforms such as Arweave [32], which offers a unique permanent storage solution, might incur higher costs due to its robust infrastructure for long-term data preservation. Filecoin [35], Sia [39], and Storj [40], which integrate cryptocurrency-based transactions, offer variable cost efficiency influenced by market fluctuations and token values. Open-source platforms such as Dat [34], Hypercore [36], IPFS [37], and MaidSafe [38] typically offer cost-effective solutions, though operational expenses may vary based on the network infrastructure and the volume of data managed.
Complexity evaluates how complicated it is to set up, manage, and use a system [20,31,66]. A high complexity rating indicates the system is complex to configure and operate, possibly requiring specialized knowledge or significant effort. Moderate complexity implies that the system has a learning curve but is manageable for most users with some experience, while low complexity indicates a straightforward system. In terms of complexity, BitTorrent is relatively user-friendly, especially for users acquainted with P2P file-sharing [33]. However, blockchain-based solutions such as Filecoin, Sia, and Storj tend towards higher complexity due to their decentralized nature and the intricacies of blockchain technology. Arweave’s approach to data storage introduces additional complexity, particularly in understanding its blockweave structure for permanent data storage [32]. Similarly, IPFS and Swarm, though offering innovative solutions in content addressing and decentralized public record storage, add complexity to network management. Dat and Hypercore, with their focus on data versioning and scientific data sharing, reach a moderate balance in complexity.
Ease of integration is a crucial consideration that assesses how easily a system can be integrated with other platforms, software, or infrastructure [65,66,83]. User Interface assesses the ease of use and accessibility of the platform’s interface [66,83]. An easy rating indicates a user-friendly interface that is intuitive and easy to navigate, moderate reflects a functional interface that may have a steeper learning curve, and difficult indicates a challenging interface requiring technical expertise or extensive training. BitTorrent’s widespread recognition and the availability of numerous types of client software make it relatively easy to integrate across various applications [33,64]. Filecoin and Storj, despite their complexity, offer comprehensive tools and APIs to facilitate integration, but the learning curve can be steep due to the use of blockchain [35,40]. IPFS, with its extensive APIs and libraries, supports smoother integration, although its unique data storage approach may require additional adaptation [37]. MaidSafe’s SAFE network presents a unique set of challenges and opportunities in integration, especially for applications that prioritize high security and privacy [38,70,71]. Arweave, Dat, Hypercore, and Swarm, catering to more specialized use cases, demand a stronger technical understanding for effective integration [32,34,36,41].
The selection of a decentralized storage platform is dictated by the balance between financial considerations and technical demands, alongside specific project or application requirements. Platforms such as BitTorrent offer simplicity and cost-effectiveness but may lack the advanced features of more complex systems such as Filecoin or Arweave. The best platform for users depends on a combination of financial and technical elements and their particular data storage and management requirements.

4. Conclusions and Directions for Future Research

4.1. Conclusions

This paper provided an in-depth exploration of decentralized storage systems, focusing on their capabilities, performance metrics, and suitability for sustainable DSS. Decentralized storage systems, particularly those powered by blockchain technology, represent a promising solution for achieving DSS principles by offering users greater control over their data with enhanced privacy and security. This study highlights the importance of aligning these features with user requirements to facilitate optimal platform selection. Key findings indicate that while blockchain-based systems offer enhanced security and data integrity, there is significant variation in their complexity, cost efficiency, and performance. This variability necessitates a careful evaluation by users to ensure that the most appropriate decentralized storage solution meets their needs. Blockchain-powered decentralized storage systems have become increasingly crucial to establishing self-sovereign data management, offering secure, resilient, and user-centric solutions. As the global focus on data ownership, privacy, and security strengthens, these platforms are becoming more relevant, providing a robust framework for achieving data sovereignty in the digital age. The present study is a valuable resource for users, developers, and researchers by allowing them to make informed decisions regarding selecting and implementing decentralized storage platforms that best meet their data sovereignty and operational requirements.

4.2. Directions for Future Research

While this study has laid the groundwork for understanding the current state of decentralized storage systems, several avenues for future research can enhance the development and adoption of these technologies:
  • Interoperability and Standardization: Future research should focus on developing protocols and standards that facilitate interoperability across decentralized storage solutions. Achieving seamless integration between various platforms will be crucial for fostering a cohesive decentralized storage ecosystem.
  • Data Migration and Portability: As the adoption of decentralized storage systems grows, there will be an increasing need for methods and tools that enable seamless data migration across platforms. Ensuring data portability without compromising security, privacy, and performance will be critical for future innovation.
  • Security and Privacy Enhancements: Continued development of advanced post-quantum encryption techniques, privacy-preserving algorithms, and robust security mechanisms is essential for maintaining the confidentiality and integrity of data stored in decentralized systems. Research should also explore new approaches such as quantum-resistant algorithms, zero-knowledge proofs, homomorphic encryption, and AI-driven threat detection to mitigate emerging threats in this space.
  • Usability and User Experience: Improving the usability and accessibility of decentralized storage platforms is vital for broader adoption. Future work should include user studies and usability testing to refine user interfaces and experiences, making these systems more accessible to technical and non-technical users.
  • Environmental Impact and Sustainability: As blockchain-based storage solutions become more widespread, it is essential to assess their environmental impact. Research should explore eco-friendly approaches to reduce energy consumption and carbon emissions, ensuring that these technologies are sustainable in the long term.
  • Community and Governance Structures: The decentralized nature of these systems calls for innovative community-driven governance models. Future research should examine decentralized incentives, decision-making mechanisms, and governance structures that promote inclusivity, transparency, and democratic participation within decentralized storage networks.
  • Regulatory and Legal Implications: As decentralized storage systems challenge traditional data management paradigms, there is a pressing need to explore the regulatory and legal implications of data sovereignty and privacy in these environments. Future studies should address how these systems can comply with existing regulations while preserving their decentralized nature.
In conclusion, decentralized storage systems hold great promise for the future of data management, offering users unprecedented control over their digital assets. However, realizing this potential will require ongoing research and development to address the challenges identified in this paper. By focusing on these future directions, the academic and technical communities can contribute to the maturation of decentralized storage technologies, paving the way for more secure, private, and user-empowered digital platforms.

Author Contributions

Conceptualization, M.M.M. and H.P.I.; methodology, M.M.M.; validation, M.M.M. and H.P.I.; formal analysis, M.M.M.; investigation, M.M.M. and H.P.I.; resources, M.M.M.; data curation, M.M.M.; writing—original draft preparation, M.M.M.; writing—review and editing, M.M.M. and H.P.I.; visualization, M.M.M.; supervision, H.P.I.; project administration, M.M.M.; funding acquisition, H.P.I. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partially supported by the Korea University Computer Science Brain Korea 21 (BK21) FOUR research funding; the National Research Foundation of Korea (NRF), under Grant No. NRF-2021R1A2C2012476 (Blockchain Technology Research for Personal Data Right Assurance); and Seoul-type private investment linked technology commercialization support, Seoul Business Agency (SBA), under Grant No. VC230019.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

Author Merlec M.M. was employed by the Department of Computer Science and Engineering, Korea University. Author H.P.I. was employed by the Department of Computer Science and Engineering, Korea University, and the DAO Solution, Seoul, South Korea. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as potential conflicts of interest.

References

  1. De Mooy, M. Rethinking Privacy Self-Management and Data Sovereignty in the Age of Big Data: Considerations for Future Policy Regimes in the United States and the European Union; Bertelsmann Stiftung; Springer: Berlin/Heidelberg, Germany, 2017; Available online: https://cdt.org/insights/rethinking-privacy-self-management-and-data-sovereignty-in-the-age-of-big-data/ (accessed on 20 July 2024).
  2. Hummel, P.; Braun, M.; Tretter, M.; Dabrock, P. Data sovereignty: A review. Big Data Soc. 2021, 8, 1. [Google Scholar] [CrossRef]
  3. Duisberg, A. Legal Aspects of IDS: Data Sovereignty—What Does It Imply? In Designing Data Spaces; Springer: Cham, Switzerland, 2022; pp. 61–90. [Google Scholar]
  4. De Filippi, P.; McCarthy, S. Cloud computing: Centralisation and data sovereignty. Eur. J. Law Technol. 2012, 3, 1–18. [Google Scholar]
  5. Irion, K. Government cloud computing and national data sovereignty. Policy Internet 2012, 4, 40–71. [Google Scholar] [CrossRef]
  6. Hummel, P.; Braun, M.; Augsberg, S.; Dabrock, P. Sovereignty and data sharing. ITU J. ICT Discov. Spec. Issue 2018, 25, 1–10. [Google Scholar]
  7. Vaile, D. The cloud and data sovereignty after Snowden. Aust. J. Telecommun. Digit. Econ. 2014, 2, 31–41. [Google Scholar] [CrossRef]
  8. Sun, Y.; Zhang, J.; Xiong, Y.; Zhu, G. Data Security and Privacy in Cloud Computing. Int. J. Distrib. Sens. Netw. 2014, 10, 190903. [Google Scholar] [CrossRef]
  9. Yang, P.; Xiong, N.; Ren, J. Data security and privacy protection for cloud storage: A survey. IEEE Access 2020, 8, 131723–131740. [Google Scholar] [CrossRef]
  10. Pasupulati, R.P.; Shropshire, J. Analysis of centralized and decentralized cloud architectures. In Proceedings of the SoutheastCon 2016, Norfolk, VA, USA, 30 March–3 April 2016; pp. 1–7. [Google Scholar]
  11. Park, D.B.; Li, X.; Shahhosseini, A.M.; Tsay, L.S. Data ownership in cloud: Legal issues. Int. J. Forensic Eng. Manag. 2021, 1, 125–148. [Google Scholar] [CrossRef]
  12. Alboaie, S.; Cosovan, D. Private Data System Enabling Self-Sovereign Storage Managed by Executable Choreographies. In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Springer: Neuchâtel, Switzerland, 2017; Volume 10320, pp. 83–98. [Google Scholar]
  13. Liang, X.; Shetty, S.; Zhao, J.; Bowden, D.; Li, D.; Liu, J. Towards decentralized accountability and self-sovereignty in healthcare systems. In Proceedings of the 19th International Conference on Information and Communications Security, Beijing, China, 6–8 December 2017; pp. 387–398. [Google Scholar]
  14. Yan, Z.; Gan, G.; Riad, K. BC-PDS: Protecting Privacy and Self-Sovereignty through BlockChains for OpenPDS. In Proceedings of the IEEE Symposium on Service-Oriented System Engineering (SOSE), San Francisco, CA, USA, 6–9 April 2017; pp. 138–144. [Google Scholar]
  15. Kim, G.-H. A Design of Self-Sovereign Data Distribution Platform for a Reliable Data Economy. J. Digit. Contents Soc. 2021, 22, 483–490. [Google Scholar] [CrossRef]
  16. Fallatah, K.U.; Barhamgi, M.; Perera, C. Personal Data Stores (PDS): A Review. Sensors 2023, 23, 1477. [Google Scholar] [CrossRef]
  17. Abbas, A.E.; van Velzen, T.; Ofe, H.; van de Kaa, G.; de Reuver, M. Beyond control over data: Conceptualizing data sovereignty from a social contract perspective. Electron. Mark. 2024, 34, 20. [Google Scholar] [CrossRef]
  18. Merlec, M.M.; In, H.P. DataMesh+: A Blockchain-Powered Peer-to-Peer Data Exchange Model for Self-Sovereign Data Marketplaces. Sensors 2024, 24, 1896. [Google Scholar] [CrossRef] [PubMed]
  19. Sharma, P.; Jindal, R.; Borah, M.D. Blockchain-based decentralized architecture for cloud storage system. J. Inf. Secur. Appl. 2021, 62, 102970. [Google Scholar] [CrossRef]
  20. Khalid, M.I.; Ehsan, I.; Al-Ani, A.K.; Iqbal, J.; Hussain, S.; Ullah, S.S.; Nayab. Department of Information Technology, The University of Haripur, Haripur, Pakistan. A comprehensive survey on blockchain-based decentralized storage networks. IEEE Access 2023, 11, 10995–11015. [Google Scholar] [CrossRef]
  21. Zang, H.; Kim, H.; Kim, J. Blockchain-Based Decentralized Storage Design for Data Confidence Over Cloud-Native Edge Infrastructure. IEEE Access 2024, 12, 50083–50099. [Google Scholar] [CrossRef]
  22. Ali, S.; Wang, G.; White, B.; Cottrell, R.L. A Blockchain-Based Decentralized Data Storage and Access Framework for PingER. In Proceedings of the 2018 17th IEEE International Conference on Trust, Security and Privacy in Computing and Communications/12th IEEE International Conference on Big Data Science and Engineering (TrustCom/BigDataSE), New York, NY, USA, 1–3 August 2018; pp. 1303–1308. [Google Scholar]
  23. Jariwala, M.P.; Obaidat, M.S.; Wazid, M.; Mishra, A.K.; Singh, D.P. Designing Blockchain-Based Decentralized Scheme for Secure File Storage System. In Proceedings of the 2024 International Conference on Computer, Information and Telecommunication Systems (CITS), Girona, Spain, 17–19 July 2024. [Google Scholar]
  24. Wang, S.; Zhang, Y.; Zhang, Y. A blockchain-based framework for data sharing with fine-grained access control in decentralized storage systems. IEEE Access 2018, 6, 38437–38450. [Google Scholar] [CrossRef]
  25. Li, G.; Sato, H. A privacy-preserving and fully decentralized storage and sharing system on blockchain. In Proceedings of the 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), Milwaukee, WI, USA, 15–19 July 2019; Volume 2, pp. 694–699. [Google Scholar]
  26. Merlec, M.M.; Lee, Y.K.; Hong, S.-P.; In, H.P. A smart contract-based dynamic consent management system for personal data usage under GDPR. Sensors 2021, 21, 7994. [Google Scholar] [CrossRef]
  27. Albrecht, J.P. How the GDPR will change the world. Eur. Data Prot. Law Rev. 2016, 2, 287. [Google Scholar] [CrossRef]
  28. Voigt, P.; Von dem Bussche, A. The eu general data protection regulation (gdpr). In A Practical Guide, 1st ed.; Springer International Publishing: Cham, Switzerland, 2017; Volume 10, pp. 10–5555. [Google Scholar]
  29. IDC & Statista. Volume of Data/Information Created, Captured, Copied, and Consumed Worldwide from 2010 to 2020, with Forecasts from 2021 to 2025 (in Zettabytes). Statista, Statista Inc. 7 June 2021. Available online: https://www.statista.com/statistics/871513/worldwide-data-created/ (accessed on 20 July 2024).
  30. Fortune Business Insights. Size of The Big Data Analytics Market Worldwide from 2021 to 2029 (in Billion U.S. Dollars); Statista ID 1336002; Statista Inc.: New York, NY, USA, 2022. [Google Scholar]
  31. Daniel, E.; Tschorsch, F. IPFS and friends: A qualitative comparison of next generation peer-to-peer data networks. IEEE Commun. Surv. Tutor. 2022, 24, 31–52. [Google Scholar] [CrossRef]
  32. Williams, S.; Diordiiev, V.; Berman, L.; Uemlianin, I. Arweave: A Protocol for Economically Sustainable Information Permanence. Arweave Yellow Paper. 2019. Available online: https://arweave.org/yellow-paper.pdf (accessed on 20 January 2024).
  33. Pouwelse, J.; Garbacki, P.; Epema, D.; Sips, H. The bittorrent p2p file-sharing system: Measurements and analysis. In International Workshop on Peer-to-Peer Systems; Springer: Berlin/Heidelberg, Germany, 2005; pp. 205–216. [Google Scholar]
  34. Ogden, M.; McKelvey, K.; Madsen, M.B. Dat—Distributed dataset synchronization and versioning. Open Science Framework, 10(2.2). 2017. Available online: http://slides.kevinmarks.com/dat-paper.pdf (accessed on 20 July 2024).
  35. Protocol Labs. Filecoin: A decentralized storage network. Protocol Labs. July 2017. Available online: https://filecoin.io/filecoin.pdf (accessed on 20 July 2024).
  36. Hypercore Protocol—Github., Nov. 2023. Available online: https://github.com/hypercore-protocol (accessed on 20 July 2024).
  37. Benet, J. IPFS—Content addressed versioned P2P file system (DRAFT 3). arXiv 2014, arXiv:1407.3561. [Google Scholar]
  38. Irvine, D. Maidsafe Distributed File System. 2010. Available online: https://docs.maidsafe.net/Whitepapers/pdf/MaidSafeDistributedFileSystem.pdf (accessed on 20 July 2024).
  39. Vorick, D.; Champine, L. Sia: Simple Decentralized Storage. 2014. Available online: https://sia.tech/sia.pdf (accessed on 20 July 2024).
  40. Wilkinson, S.; Boshevski, T.; Brandoff, J.; Prestwich, J.; Hall, G.; Gerbes, P.; Hutchins, P.; Pollard, C. Storj: A Decentralized Cloud Storage Network Framework v3.0. White Paper. 2018. Available online: https://www.storj.io/storjv3.pdf (accessed on 20 July 2024).
  41. Swarm. SWARM: Storage and Communication Infrastructure for a Self-Sovereign Digital Society. 2021. Available online: https://www.ethswarm.org/swarm-whitepaper.pdf (accessed on 20 July 2024).
  42. Raval, S. Decentralized Applications: Harnessing Bitcoin’s Blockchain Technology, 1st ed.; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2016. [Google Scholar]
  43. Pincheira, M.; Donini, E.; Vecchio, M.; Kanhere, S. A Decentralized Architecture for Trusted Dataset Sharing Using Smart Contracts and Distributed Storage. Sensors 2022, 22, 9118. [Google Scholar] [CrossRef] [PubMed]
  44. Mühle, A.; Grüner, A.; Gayvoronskaya, T.; Meinel, C. A survey on essential components of a self-sovereign identity. Comput. Sci. Rev. 2018, 30, 80–86. [Google Scholar] [CrossRef]
  45. Schardong, F.; Custódio, R. Self-Sovereign Identity: A Systematic Review, Mapping and Taxonomy. Sensors 2022, 22, 5641. [Google Scholar] [CrossRef]
  46. Ding, Y.; Sato, H. Self-sovereign identity as a service: Architecture in practice. In Proceedings of the 2022 IEEE 46th Annual Computers, Software, and Applications Conference (COMPSAC), Los Alamitos, CA, USA, 27 June–1 July 2022; pp. 1536–1543. [Google Scholar]
  47. Nakamoto, S. Bitcoin: A Peer-to-Peer Electronic Cash System; BN Publishing: La Vergne, TN, USA, 2008. [Google Scholar]
  48. Wood, G. Ethereum: A secure decentralised generalised transaction ledger. Ethereum Proj. Yellow Pap. 2014, 151, 1–32. [Google Scholar]
  49. Alzhrani, F.E.; Saeedi, K.A.; Zhao, L. A Taxonomy for Characterizing Blockchain Systems. IEEE Access 2022, 10, 110568–110589. [Google Scholar] [CrossRef]
  50. Androulaki, E.; Barger, A.; Bortnikov, V.; Cachin, C.; Christidis, K.; De Caro, A.; Enyeart, D.; Ferris, C.; Laventman, G.; Manevich, Y.; et al. Hyperledger Fabric: A Distributed Operating System for Permissioned Blockchains. In Proceedings of the Thirteenth EuroSys Conference.,EuroSys ’18; Association for Computing Machinery: New York, NY, USA, 2018; pp. 30:1–30:15. [Google Scholar]
  51. Valenta, M.; Sandner, P. Comparison of Ethereum, Hyperledger Fabric and Corda; Frankfurt School Blockchain Center: Hessen, Germany, 2017; Volume 8, pp. 1–8. [Google Scholar]
  52. Brown, R.G.; Carlyle, J.; Grigg, I.; Hearn, M. Corda: An Introduction. Available online: https://www.smallake.kr/wp-content/uploads/2016/10/corda-introductory-whitepaper-final.pdf (accessed on 20 July 2024).
  53. Benji, M.; Sindhu, M. A study on the Corda and Ripple blockchain platforms. In Advances in Big Data and Cloud Computing; Springer: Berlin/Heidelberg, Germany, 2019; pp. 179–187. [Google Scholar]
  54. Merlec, M.M.; Islam, M.M.; Lee, Y.K.; In, H.P. A Consortium Blockchain-Based Secure and Trusted Electronic Portfolio Management Scheme. Sensors 2022, 22, 1271. [Google Scholar] [CrossRef]
  55. Ampel, B.; Patton, M.; Chen, H. Performance modeling of hyperledger sawtooth blockchain. In Proceedings of the 2019 IEEE International Conference on Intelligence and Security Informatics (ISI), Shenzhen, China, 1–3 July 2019; pp. 59–61. [Google Scholar]
  56. Mazzoni, M.; Corradi, A.; Nicola, V.D. Performance evaluation of permissioned blockchains for financial applications: The ConsenSys Quorum case study. Blockchain Res. Appl. 2021, 3, 100026. [Google Scholar] [CrossRef]
  57. Lashkari, B.; Musilek, P. A Comprehensive Review of Blockchain Consensus Mechanisms. IEEE Access 2021, 9, 43620–43652. [Google Scholar] [CrossRef]
  58. Pan, J.; Song, Z.; Hao, W. Development in Consensus Protocols: From PoW to PoS to DPoS. In Proceedings of the 2021 2nd International Conference on Computer Communication and Network Security (CCNS), Xining, China, 30 July–1 August 2021; pp. 59–64. [Google Scholar]
  59. Islam, M.M.; Merlec, M.M.; In, H.P. A comparative analysis of proof-of-authority consensus algorithms: Aura vs. Clique. In Proceedings of the 2022 IEEE International Conference on Services Computing (SCC), Barcelona, Spain, 10–16 July 2022; pp. 327–332. [Google Scholar]
  60. Hewa, T.; Ylianttila, M.; Liyanage, M. Survey on blockchain based smart contracts: Applications, opportunities and challenges. J. Netw. Comput. Appl. 2021, 177, 102857. [Google Scholar] [CrossRef]
  61. Rouhani, S.; Deters, R. Security, performance, and applications of smart contracts: A systematic survey. IEEE Access 2019, 7, 50759–50779. [Google Scholar] [CrossRef]
  62. Wang, S.; Ding, W.; Li, J.; Yuan, Y.; Ouyang, L.; Wang, F.Y. Decentralized Autonomous Organizations: Concept, Model, and Applications. IEEE Trans. Comput. Soc. Syst. 2019, 6, 870–878. [Google Scholar] [CrossRef]
  63. Chishti, M.S.; Sufyan, F.; Banerjee, A. Decentralized On-Chain Data Access via Smart Contracts in Ethereum Blockchain. IEEE Trans. Netw. Serv. Manag. 2021, 174–187. [Google Scholar] [CrossRef]
  64. Assawamekin, N.; Kijsipongse, E. Design and implementation of BitTorrent file system for distributed animation rendering. In Proceedings of the 2013 International Computer Science and Engineering Conference (ICSEC), Nakhonpathom, Thailand, 4–6 September 2013; pp. 68–72. [Google Scholar]
  65. Zhu, Y.; Lv, C.; Zeng, Z.; Wang, J.; Pei, B. Blockchain-based Decentralized Storage Scheme. J. Phys. Conf. Ser. 2019, 1237, 042008. [Google Scholar] [CrossRef]
  66. Benisi, N.Z.; Aminian, M.; Javadi, B. Blockchain-based decentralized storage networks: A survey. J. Netw. Comput. Appl. 2020, 102656. [Google Scholar] [CrossRef]
  67. Bacis, E.; di Vimercati, S.D.C.; Foresti, S.; Paraboschi, S.; Rosa, M.; Samarati, P. Securing Resources in Decentralized Cloud Storage. IEEE Trans. Inf. Forensics Secur. 2020, 15, 286–298. [Google Scholar] [CrossRef]
  68. Shah, M.; Shaikh, M.; Mishra, V.; Tuscano, G. Decentralized Cloud Storage Using Blockchain. In Proceedings of the 2020 4th International Conference on Trends in Electronics and Informatics (ICOEI) (48184), Tirunelveli, India, 16–18 April 2020; pp. 384–389. [Google Scholar]
  69. Zhu, Z.; Qi, G.; Zheng, M.; Sun, J.; Chai, Y. Blockchain based consensus checking in decentralized cloud storage. Simul. Model. Pract. Theory 2020, 102, 101987. [Google Scholar] [CrossRef]
  70. Lambert, N.; Bollen, B. The SAFE Network—A New Decentralised Internet. 2014. Available online: http://docs.maidsafe.net/Whitepapers/pdf/TheSafeNetwork.pdf (accessed on 20 July 2024).
  71. Irvine, D. MaidSafe Distributed Hash Table. 2010. Available online: https://docs.maidsafe.net/Whitepapers/pdf/MaidSafeDistributedHashTable.pdf (accessed on 20 July 2024).
  72. Nick, L.; Ma, Q.; Irvine, D. Safecoin: The Decentralised Network Token. Maidsafe Tech. Rep. 2015. Available online: https://docs.maidsafe.net/Whitepapers/pdf/Safecoin.pdf (accessed on 20 July 2024).
  73. Bush, R.; Choi, S. Forecasting Ethereum STORJ Token Prices: Comparative Analyses of Applied Bitcoin Models. In Proceedings of the 2019 International Conference on Data Mining Workshops (ICDMW), Beijing, China, 8–11 November 2019; pp. 216–223. [Google Scholar]
  74. Gong, F.; Kong, L.; Lu, Y.; Qian, J.; Min, X. An Overview of Blockchain Scalability for Storage. In Proceedings of the 2023 26th International Conference on Computer Supported Cooperative Work in Design (CSCWD), Rio de Janeiro, Brazil, 24–26 May 2023; pp. 516–521. [Google Scholar] [CrossRef]
  75. Liu, W.; Huang, H.; Yin, H.; Min, G.; Yuan, Y.; Wu, D. Scalable Blockchain-Based Data Storage in Internet of Things. IEEE Commun. Mag. 2024, 62, 40–45. [Google Scholar] [CrossRef]
  76. Pinheiro, E.; Bianchini, R.; Dubnicki, C. Exploiting redundancy to conserve energy in storage systems. In Proceedings of the Joint Intl. Conference on Measurement and Modeling of Computer Systems, New York, NY, USA, 15–26 June 2006. [Google Scholar]
  77. Brinkmann, A.; Effert, S. Redundant data placement strategies for cluster storage environments. In International Conference On Principles of Distributed Systems; Springer: Berlin/Heidelberg, Germany, 2008; pp. 551–554. [Google Scholar]
  78. Kermarrec, A.-M.; Le Merrer, E.; Straub, G.; van Kempen, A. Availability-Based Methods for Distributed Storage Systems. In Proceedings of the 2012 IEEE 31st Symposium on Reliable Distributed Systems, Irvine, CA, USA, 8–11 October 2012; pp. 151–160. [Google Scholar]
  79. Povyshev, A.A.; Sokolov, A.N. Ensuring the Integrity and Availability of Information in the Model of Decentralized Data Storage System. In Proceedings of the 2024 International Russian Smart Industry Conference (SmartIndustryCon), Sochi, Russia, 25–29 March 2024; pp. 517–521. [Google Scholar]
  80. Ismail, A.; Toohey, M.; Lee, Y.C.; Dong, Z.; Zomaya, A.Y. Cost and Performance Analysis on Decentralized File Systems for Blockchain-Based Applications: State-of-the-Art Report. In Proceedings of the 2022 IEEE International Conference on Blockchain (Blockchain), Espoo, Finland, 22–25 August 2022; pp. 230–237. [Google Scholar]
  81. Shah, S.C. An Energy-Efficient Resource Management System for a Mobile Ad Hoc Cloud. IEEE Access 2018, 6, 62898–62914. [Google Scholar] [CrossRef]
  82. Li, J.; Wu, J.; Chen, L.; Li, J. Blockchain-based secure and reliable distributed deduplication scheme. In Proceedings of the Algorithms and Architectures for Parallel Processing: 18th International Conference, ICA3PP 2018, Guangzhou, China, 15–17 November 2018; Proceedings, Part I 18. Springer: Cham, Switzerland; Newark, NJ, USA, 2018; pp. 393–405. [Google Scholar]
  83. Schweiger, P. Improving Usability of Blockchain-Based Decentralized Applications. Master’s Thesis, University of Applied Sciences Technikum Wien, Wien, Austria, 2021. [Google Scholar]
Figure 1. (a) Volume of data created, collected, and consumed worldwide in zettabytes from 2010 to 2017, with predictions to 2025 [29]. (b) Global big data analytics market size in billions of U.S. dollars for 2021 and forecasts up to 2029 [30].
Figure 1. (a) Volume of data created, collected, and consumed worldwide in zettabytes from 2010 to 2017, with predictions to 2025 [29]. (b) Global big data analytics market size in billions of U.S. dollars for 2021 and forecasts up to 2029 [30].
Sustainability 16 07671 g001
Figure 2. (a) Centralized, (b) decentralized, and (c) distributed storage system networks [42].
Figure 2. (a) Centralized, (b) decentralized, and (c) distributed storage system networks [42].
Sustainability 16 07671 g002
Figure 3. Typical architecture of a self-sovereign identity (SSI) system.
Figure 3. Typical architecture of a self-sovereign identity (SSI) system.
Sustainability 16 07671 g003
Figure 4. Representative blockchain structure.
Figure 4. Representative blockchain structure.
Sustainability 16 07671 g004
Figure 5. Overview of a blockchain system architecture with smart contracts.
Figure 5. Overview of a blockchain system architecture with smart contracts.
Sustainability 16 07671 g005
Figure 6. Typical architecture of a blockchain-based decentralized storage system.
Figure 6. Typical architecture of a blockchain-based decentralized storage system.
Sustainability 16 07671 g006
Table 1. Types of blockchain networks.
Table 1. Types of blockchain networks.
Public BlockchainPrivate BlockchainConsortium Blockchain
Network TopologySustainability 16 07671 i001Sustainability 16 07671 i002Sustainability 16 07671 i003
Key Features
  • Permissionless
  • Trustless environment
  • Pseudo-anonymity
  • Unknown parties
  • Public transactions
  • Permissioned
  • Trusted environment
  • Well-known parties
  • Private transactions
  • Single organization
  • Permissioned
  • Trusted environment
  • Federated consensus
  • Well-known parties
  • Consortium of organizations
Advantages+ Independence
+ Decentralization
+ Transparency
+ Trust
+ Access control + Access control
+ Performance+ Performance
+ Scalability+ Scalability
+ Privacy+ Privacy
Drawbacks- Performance and scalability
- Security at high cost
- Resource consumption
- Transaction cost
- Trust and transparency (limited)
- No public auditability
- (Centralized) governance
- Transparency (limited)
- High complexity
Table 2. Comparison of key features of selected decentralized storage systems.
Table 2. Comparison of key features of selected decentralized storage systems.
PlatformUnderlying
Technology
Use CaseSecurity FeaturesPrivacyBlockchain
Utilization
Payment ModelData ControlVersioning SupportCommunity/Adoption
Arweave [32]BlockweavePermanent data storageEncryption,
permanence
HighYesCryptocurrency
(AR tokens)
UserNoGrowing
BitTorrent [33]P2P file sharingFile sharingBasic (depends on file source)ModerateNoFree
(Ad-supported)
UserNoVery Wide
Dat [34]P2P data sharingScientific data sharingEncryption, version
control
HighNoFree
(Open-source)
UserYesNiche
Filecoin [35]BlockchainDecentralized storage Encryption, file contractsHighYesCryptocurrency
(Filecoin)
UserNoGrowing
Hypercore
Protocol [36]
Distributed ledgerSecure, real-time data
sharing
Encryption,
real-time updates
HighNoFree
(Open-source)
UserYesEmerging
IPFS [37]P2P networkDecentralized file sharingContent addressing,
file identifiers
ModerateNoFree
(Open-source)
UserYesWide
MaidSafe [38]Decentralized networkSecure data managementEncryption, secure
network
HighNoFree
(Open-source)
UserYesEstablished
Sia [39]BlockchainDecentralized storageFile splitting, encryptionHighYesCryptocurrency
(Siacoin)
UserYesNiche
Storj [40]Decentralized cloud storageDecentralized storageEncryption, shardingHighNoCryptocurrency
(STORJ token)
UserYesWide
Swarm [41]Ethereum Web3 stackDecentralized record
storage
Encryption, redundancyHighYesToken-based
(BZZ)
UserYesEmerging
Table 3. Comparison of performance metrics of selected decentralized storage systems.
Table 3. Comparison of performance metrics of selected decentralized storage systems.
PlatformSpeed
(Upload/Download)
LatencyThroughputData
Redundancy
SecurityScalabilityUser
Interface
Data
Reliability
Cost
Efficiency
Arweave [32]ModerateModerateHighVery HighHighHighModerateVery HighModerate
BitTorrent [33]High
(for popular files)
LowHighHighModerateHigh
(for popular files)
EasyHighHigh
Dat Protocol [34]ModerateModerateModerateHighHighModerateModerateHighHigh
Filecoin [35]HighLowHighHighHighHighModerateHighVariable
Hypercore
Protocol [36]
HighLowHighHighHighHighModerateHighHigh
IPFS [37]ModerateModerateHighHighHighHighModerateHighHigh
MaidSafe [38]ModerateModerateModerateHighHighModerateModerateHighHigh
Sia [39]HighModerateHighHighHighModerateModerateHighVariable
Storj [40]HighLowHighHighHighHighModerateHighVariable
Swarm [41]ModerateModerateModerateHighHighModerateModerateHighVariable
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Merlec, M.M.; In, H.P. Blockchain-Based Decentralized Storage Systems for Sustainable Data Self-Sovereignty: A Comparative Study. Sustainability 2024, 16, 7671. https://doi.org/10.3390/su16177671

AMA Style

Merlec MM, In HP. Blockchain-Based Decentralized Storage Systems for Sustainable Data Self-Sovereignty: A Comparative Study. Sustainability. 2024; 16(17):7671. https://doi.org/10.3390/su16177671

Chicago/Turabian Style

Merlec, Mpyana Mwamba, and Hoh Peter In. 2024. "Blockchain-Based Decentralized Storage Systems for Sustainable Data Self-Sovereignty: A Comparative Study" Sustainability 16, no. 17: 7671. https://doi.org/10.3390/su16177671

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop