3.2.3. Similarity

A similarity tool's ultimate aim is to function as a drop-in substitute for the crypto hashes used in forensic file practice for file filtering [28]. Approximate matching may be accomplished using two distinct abstractions: byte wise matching and semantic matching. (1) Byte wise matching: this algorithm works at the byte level and accepts only byte sequences as input. Byte wise algorithms serve two primary purposes. A feature extraction function detects and extracts properties from objects in order to compress them for comparative purposes. Then, a similarity function compares these compressed versions in order to provide a normalized match score. Typically, this comparison is made using string formulae such as Hamming and Levenshtein distances [25]. Byte-wise has a number of restrictions, including [25]. (1) It is unable to detect similarities at a higher level of abstraction, for example, semantically. (2) It is unable to properly match two image files that contain the same semantic image but are stored in various file kinds and formats as a result of their differing binary encodings. (3) Due to the absence of a universally accepted definition of similarity, not all types of byte-level similarity are equally useful since certain artifacts (e.g., headers and footers) are trivial and result in false positives.

This research focuses on the second type, (2) semantic matching, which operates on the content visual layer (i.e., digital evidence images) and thus closely resembles human behavior, for example, the similarity of the content of a JPG and a PNG image, despite the fact that the image file types are different. To put it another way, two images are semantically similar if they convey the same information. For instance, a JPG file is semantically equivalent to an exported PNG file containing the same image. Their cryptographic hashes will not be same, but the images will be identical [25]. To compare two hash values, a comparison function is required. The comparison function takes two hash values as input and returns a number between 0 and X, where X is the maximum match score. A score of X indicates that the hash values are identical or nearly similar, implying that the input files are also identical or nearly identical. The similarity score should ideally be between 0 and 100 and expressed as a percentage.

The suggested framework uses MRSH-v2 for creating the digital grey hash for each block within the blockchain network that utilizes the Hierarchical Bloom Filter Tree (HBFT) approach. As stated in [27], HBFT is quite good at detecting files that share at least 40% of their content, and it has excellent recall when dealing with identical sets of data. This means that the HBFT data structure is an effective alternative to all-against-all comparisons while also delivering significant speed benefits. The HBFT approach yielded a recall level of 95% for similar files when using mrsh-v2 as ground truth. Therefore, the proposed framework has considered 95% as the appropriate metric for resemblance [28]. See [29–34] for more information regarding fuzzy hashing techniques.

#### *3.3. Peer to Peer (P2P) Network*

P2P is used to create the network architecture and to facilitate communication between the blockchain layer and the rest of the network (responsible for constructing a blockchain for each node in the underlying network). The majority of blockchain schemes use a peer-to-peer network as a blockchain network. This work utilizes a peer-to-peer network to organize nodes, offers peer-to-peer routing, secures the transfer of proof information, and maintains the Blockchain's consensus. Existing peer-to-peer network methods may be utilized directly or modified to build the Blockchain's network [10].

#### *3.4. Consensus Mechanism*

The blockchain consensus process selects a node to generate and broadcast the blockchain next block and ensures that each node's blockchain is consistent [10]. A blockchain transaction is verified via the application of a consensus concept. Consensus ensures that each transaction has its own independent witness mechanism. On the blockchain, there are many forms of consensus, including Proof of Work (PoW), Proof of Stack (PoS), and Proof of Authority (PoA). Consensus types vary according to how the blockchain interacts with data storage [15].

With PoW, nodes compete against one another by solving a mathematical problem to confirm transactions and create new blocks. While solving a block is a computationally demanding job, validating it is straightforward. To further incentivize such a system, solving a block also results in the mining of a certain number of bitcoins, which serves as the incentive for block makers (often referred to as miners) [21]. PoW is suitable for permissionless networks, that is, networks in which nodes may join without prior authorization. The primary disadvantage of PoW is its high energy consumption, which also precludes its use in some situations [21]. This has resulted in the study of other types of blockchain consensus, such as PoA. This study focuses on PoA, which is usually used in permissioned networks, i.e., networks in which nodes cannot join and become validators freely. With the PoA, validators must be pre-authorized and their identities must be known. As a consequence, behaving maliciously leads in a loss of personal reputation and, eventually, expulsion from the validator set [21].

#### *3.5. Hyperledger Blockchain Platform*

Hyperledger Fabric (HLF) is a blockchain-based system for electronic digital record exchange across several organizations. Recently, several blockchain systems have been created by different businesses, including Ethereum, Corda, and Ripple [35]. The Hyperledger Composer (HLC) is a framework for building blockchain applications that significantly speeds up and simplifies the process of designing blockchain use cases. One of the many benefits of HLC is that it is completely open-source, with an open governance architecture that allows for contributions by anybody [6]. By design, HLC satisfies all of the criteria for developing an automated system that is both robust and secure in its recording of all the information related to the evidence collection process for a specific cyber forensic case. HLC is compatible with and runs on top of the current HLF blockchain architecture and runtime, enabling pluggable blockchain consensus protocols to guarantee that transactions are verified according to the policy established by the designated business network members [6].

The proposed framework in this article is based on HLF and HLC and offers the following major benefits [6,36]: (1) it is distinguished from the others by its usage of the permissioned blockchain idea, in which transaction processing is delegated to a select group of trustworthy network members; (2) as a consequence, the resulting environment is more regulated and predictable than public permissionless blockchains; (3) block generation does not require resource-intensive computations associated with PoW techniques; (4) due to its modular nature, it enables the employment of a variety of methods to achieve agreemen<sup>t</sup> among business process participants; and (5) Ethereum is probably not the ideal cryptocurrency to use for crime-scene investigation. Digital forensic investigations require confidentiality and are conducted by genuine and trustworthy parties.

From a functional standpoint, the HLF network's nodes are classified as follows [36]: (1) clients initiate transactions, participate in their processing, and broadcast transactions to ordering services; (2) peers execute the transaction processing workflow, verify them, and maintain the blockchain registry; the blockchain registry is an append-only data structure that contains a hash chain of all transactions, as well as a concise representation of the latest ledger state; (3) Ordering Service Nodes (OSN) or, simply, orders establish the general order of all transactions in the blockchain using the distributed consensus algorithm; each transaction contains updates to the system's state, the history of which is stored in the blockchain, and cryptographic signatures of endorsing peers; the separation of processing nodes (peers) and transaction order keeps HLF's consensus as modular as feasible and facilitates protocol replacement.

To define business processes within the framework of the (HLF and HLC) platform, a variety of concepts are employed, the most important of which are assets, participants, and network-stored transactions. (1) Assets: anything of value that can be traded or shared via a network is considered an asset. The suggested approach treats digital evidence and the comprehensive information associated with it as an asset that is kept in HLC's asset registry. (2) Collaborators: participants in the forensic chain model are forensic investigators. In HLC, the participant's structure is represented using a file. It is possible to generate new instances of the modeled participant and add them to the participant register.

Additionally, HLC needs blockchain IDs as a form of identification, and an identity registry stores a collection of mappings between identities and participants. At any point in time, admin peers controlled by companies in the hyperledger composer blockchain consortium may add new participants with suitable identity responsibilities to address a specific scenario. Participants may exchange information in a secure manner using the channels available on the (HLF and HLC) platform. (3) Transactions are used to explain the activities that participants may take on assets as they travel through the network. Transactions in the proposed framework either record information about the evidence or the evidence transfer event on the network. See [37–40] for more information regarding hyperledger blockchain platform.
