*3.1. Storage Size*

With the increased storage capabilities of the systems, much of the paper documentation has been digitalized in domains like healthcare, intellectual properties, real estate, legislative contracts, etc. Furthermore, the media and social network use cases are more and more flexible, providing increased storage options for users to store their files (documents, photos, videos, etc.). DLTs caught the interest of these domains, aiming to maximize their potential, by ensuring immutability (legislative and real estate), provenance (intellectual properties), security (healthcare), etc. However, the greatest challenge of integrating DLT solutions with these domains is limited storage capabilities.

To improve the storage scalability several solutions have been proposed, such as Sharding or the integration of well-known file systems with existing DLTs. They are aiming to store all the data outside the DLT and keep only a digital fingerprint of the data on the Protocol and Network Tier. While the data kept on the Protocol and Network Tier benefits of all the advantages the system provides (consensus, immutability, security, etc.) it is considered a source of truth in the validation of data that is stored on the Scalability Tier.

Sharding is a solution implemented to improve storage scalability [103]. Different nodes are assigned to process and store only a corresponding sub-category of transactions [104]. A simple sharding technique is to split the network in shards corresponding to the transaction's prefix: 0 × 01 shard, 0 × 02 shard, etc.

For example, in the sharding mechanism proposed by Ethereum Sharding [105], the system defines objects at three different levels: level 0—transactions; level 1—collations; level 2—blocks. The collations are the data structures responsible for package transactions that belong to a shard. The collations are created and sealed by Collators that are nodes in the network registered on the main chain in the Validator Manager Contract. The Collator deposits a sum of coins on the main chain based on which they will be chosen in a Proof of Stake manner to validate the next collation. The header of the proposed collation will then be verified on-chain and added in the next block on the main chain. Cross-Sharding communication is also possible by providing Merkle-Proofs of existing transactions from the main chain. Similar approaches are investigated by Elrond Network [106], Hyperledger [34,60], Elastico [107], Omniledger [108] and Rapidchain [109].

The Scalability Tier solutions that use file systems as storage mechanisms allow large files to be stored by fragmenting, encrypting, and sharing chunks of the original file between the nodes, while the hash of the original file is stored in the Protocol and Network Tier. The nodes storing the data need to respond to periodic checks regarding the integrity of the stored data, and a reward scheme is implemented for their services.

Figure 4 shows an example of storage mechanism and integration with the Protocol and Network Tier in the case of blockchain ledgers. There are several successful implementations of such distributed file systems among which, worth mentioning are: Storj [110], IPFS [111], Filecoin [112], MediaChain [113], Decent [114], Sia [115], MadeSAFe [116], Swarm [117] and Arweave [118].

IPFS (InterPlanetary File System) is one of the most used Scalability Tier solutions for file storage. In this case, when a user publishes a large file using its own IPFS node, the node will first fragment the file in smaller chunks, the hash of each chunk becoming a node in a Merkle DAG, whose root is the hash of the initial file, thus making use of hash pointers to ensure tamper-evidence. For security reasons, the chunks stored are of standardized sizes, so that an attacker cannot extract any useful information by analyzing the size of a chunk. The owner of the data is responsible to hold the private key used to encryp<sup>t</sup> the chunks of data that are scattered across the network. This makes the system highly secured since even the data is stored across multiple nodes, the mechanisms make it impossible for anyone holding the data to use it since it is encrypted and fragmented. Moreover, it ensures security through encryption and no downtime since the file is shared across multiple users. The system offers the possibility to transfer data, check the availability and the integrity of the stored data, retrieve the data, and pay for the service provided.

**Figure 4.** Scalability Tier—File System Storage Mechanism.

Similar implementations such as Storj [110] and Filecoin [112], are proposing to reward and motivate the decentralized nodes to act honestly regarding their storage services. Ethereum Swarm [119], is a peer-to-peer system that aims to store data in a decentralized way and relies on immutable content-addressable data. While IPFS needs Filecoin to validate storage proofs, Ethereum Swarm proofs are validated at the contract level and rely on incentive schemes based on the native coin, Ether.

In Table 6 a comparison between the identified storage scalability solutions is presented. Sharding presents a promising alternative to the classic DLTs, by providing increased storage capabilities. Using sharding the DLT storage capacity is multiplied with the number of shards, having the block sealing process parallelized with each shard. However, by increasing the number of shards, fewer nodes ge<sup>t</sup> to be assigned per each shard for validation. This can easily make the network susceptible to attacks since by attacking one shard the entire system can be compromised. The file systems solution even if it provides grea<sup>t</sup> scalability in terms of storage, also requires a degree of trust between the storing nodes. For example, in the case of IPFS, since it is not fault-tolerant on its own, the DLT storing the hash ensures only tamper-evidence in the system but does not make the system tamper-resistant. Each time an update is applied to one of the files, the hash pointer changes as well, requiring a transaction updating the entry on-chain as well. While this is desired to keep a tamper-evident system, a high frequency of updates will also lead to higher costs.


