Technologies for Improving Storage Efficiency in Blockchain-Based IIoT: Comparison
Please note this is a comparison between Version 1 by Andrew Selasi Agbemenu and Version 2 by Lindsay Dong.

The Internet of Things (IoT) oT and blockchain have contributed to massive advancements in the fields to which they have been applied [25,26]. The benefits of the blockchain, which include enhanced security, transparency, and greater traceability [27], make it a promising technology for integration with IIoT, which has long had issues with security [28,29,30,31,32]. However, there are several issues that limit the integration of blockchain into Industrial Internet of Things (IIoT) oT systems [25,33]. One of these issues is the huge storage requirement of the blockchain. There are several solutions to address these concerns. These solutions, which include summarization-based, compression-based, and storage scheme optimization methods, are necessary to enable the further development of blockchain–IIoT integration. However, these solutions have shortcomings that reduce their effectiveness. Compression-based schemes produce compressed blocks or data that accumulate over time and may not ensure enough storage savings on peers. This can be alleviated by designing compression techniques that provide an efficient representation of data for IIoT systems to yield better compression ratios. Summarization-based schemes reduce redundancy in block data by using the net change in transferring entities between parties and, thus, are better suited for financial systems than for IIoT systems. 

  • blockchain
  • IIoT
  • scalability
  • storage efficiency
  • storage optimization
  • compression

1. Storage and Scalability Concerns of Blockchain–IIoT Integration

The immutable nature of the blockchain and its reliance on consensus between participating nodes give rise to several issues around the storage of the blockchain ledger. The number of blocks that can be appended to the blockchain in a given period of time is limited due to the consensus mechanism and data broadcast between nodes [1][41]; thus, the throughput of transactions is much lower compared to more traditional database-based systems [2][3][4][42,43,44].
The Industrial Internet of Things (IIoT) IoT connects many devices, all of which generate data and require management, storage, and retrieval; the throughput of typical blockchain systems would be inadequate to deal with all of these connected devices. Full nodes on a blockchain network are required to store the entire blockchain ledger. Since the ledger is append-only, the capacity of these nodes to store the ledger will eventually be exceeded, and their storage capacity would have to be expanded to adapt [5][6][7][8][9][45,46,47,48,49].
The growth of the blockchain ledger greatly affects the scalability of the blockchain system. The number of full nodes on the blockchain is also restricted due to the high storage requirements [10][50]. This increases centralization in the blockchain, which, in turn, affects the security of the system. These three blockchain characteristics—decentralization, scalability, and security—are considered crucial and are at the heart of the blockchain trilemma, a concept first described by Vitalik Buterin, the co-founder of Ethereum, as shown in Figure 1 [4]2 [44].
Figure 12.
 The blockchain trilemma.
The blockchain trilemma proposes that tradeoffs among the decentralization, scalability, and security of a blockchain system are inevitable [4][11][44,51]. The blockchain is, by nature, decentralized, and security is an essential property in its operation. However, this affects its scalability. A classic example is in the Bitcoin network, where reducing latency to improve transaction throughput may result in weakened security due to a higher probability of creating forks in the blockchain [4][44].

2. Approaches to Storage Efficiency in Blockchain–IIoT

The storage problem of the blockchain has been approached in different ways by works that propose solutions for mitigating it. These storage optimization schemes or storage models are usually motivated by specific use cases and may be designed for either permissionless or permissioned blockchains. While the same principles underlie both blockchain architectures, their designs differ in many ways. Some storage optimization schemes capitalize on certain aspects of these architectures to achieve storage efficiency. The requirements of the use case influence the blockchain architecture and, particularly in IIoT, permissioned blockchains are used, since industrial participants are known and access to data can be controlled. Some of the schemes discussed in this section can be implemented on either permissioned or permissionless blockchains. Schemes of this nature generally do not change the operation of the underlying blockchain and may involve processing of data before submission to the blockchain or changing the storage system of the peers.

2.1. Compression-Based Schemes

Compression-based schemes utilize a compression algorithm to reduce the amounts of data that are submitted as transactions to the blockchain or to reduce the size of the blocks in the blockchain. They can be divided into block compression techniques and data compression techniques. Table 1 2 shows a comparison of these schemes.
Table 12.
 Comparison of compression-based schemes.
Proposed Work Approach Algorithm Compression Ratio/ Storage Reduction Limitations
Qi et al. [12]Qi et al. [28] Data Compression
.
 SELCOM [54].

2.1.2. Data Compression

Some works have proposed the compression of product data before they are encapsulated in blockchain transactions. Qi et al. [12][28] proposed Cpds, a framework for efficient and private data sharing for product traceability using Industrial Internet of Things (IIoT) over the blockchain. As shown in Figure 35, the scautholars employ an off-chain procedure that compresses and encrypts product data before its eventual submission to the blockchain. Cpds uses a tree-based data compression mechanism that leverages the tree structure of traditional industrial systems for the amortization of data compression overhead. Participants along the path in an industrial process submit point transactions with the latest off-chain storage address of product data to the blockchain when they transfer product records to the next participant. Terminal participants compress the final product data and submit them to the blockchain as a data transaction. The scholaauthors implemented their prototype of Cpds using Java and Python, and they used Hyperledger Fabric as the blockchain. Their results showed that Cpds reduces storage overhead by 4–9 times compared to the baseline design and has between 4.8 and 20 times faster access time than that of the baseline design. Cpds is designed for permissioned blockchains for IIoT. It has a low impact on the blockchain’s core operations, since it is only an overlay framework that sits atop the blockchain platform. In terms of complexity, this approach is still relatively high, since it involves building a unified data-sharing service encompassing compression and encryption techniques that handle product record transfer between industrial participants, compression of product data, data access control, and authentication. Further research could be undertaken to determine how well Cpds performs with large product data, since their tests were performed on small product data from 100 bytes to 10 Kb.
Figure 35. Compressed and private data sharing (Cpds) [12].
 Compressed and private data sharing (Cpds) [28].

2.2. Summarization-Based Schemes

Works based on summarization propose the use of summary blocks to reduce storage overhead. These summary blocks contain details from original blocks that can then be replaced by the summary blocks. A comparison of these works can be found in Table 23.

Table 23.
 Comparison of summarization-based schemes.
Approach
Algorithm Storage Reduction Query Efficiency Latency Limitations
Xu et al. [52Tree-based key-value compression ]4–9× May have a low compression ratio for large product record data
Cloud storage Kim et al. [
Proposed Work Approach Algorithm Storage Reduction Limitations
Palai et al. 

2.3.1. Off-Chain Storage

An intuitive approach to reducing the storage burden on blockchain peers is to leverage the storage capabilities of other systems outside the blockchain network. There are two main ways in which this can be achieved: cloud storage and distributed file storage. Table 3 4 shows a comparison of these works.
Table 34.
 Comparison of off-chain storage scheme optimization works.
Proposed Work Approach Algorithm Storage Reduction Runtime Limitations
Xu et al. [22] NSGA-C 30% 872.4 s Long runtime
13]Kim et al. [54] Block Compression Block Merkle Tree 76.02% reduction Sidechain requires synchronization between nodes
Spataru et al. [14]
Dorri et al. [25]Dorri et al. [66]
Spataru et al. [
55] Block Compression Huffman coding and LZW compression 48.5% reduction Only suited for Ethereum and Ethereum-like blockchains, only focused on smart contract code size
Long repair time for decoding, leading to longer processing timeLong repair time for decoding, leading to longer processing time Chen et al. [15]Chen et al. [56] Block Compression Replacement of hash pointers with index pointers 12.71% reduction Low storage overhead reduction, not suited for large-scale systems such as IIoT
Marsalek et al. [16]Marsalek et al. [57] Block Compression Snapshot block 93% reduction Accumulation of compression results over time, suitable for UTXO-based blockchains
Yu et al. [17]Yu et al. [58] Block Compression Deflate algorithm 30.53%–42.16% of original block Increased mining difficulty
Ding et al. [18]Ding et al. [59] Block Compression Txilm Protocol 8 Increased latency

2.1.1. Block Compression

Block compression schemes aim at reducing the storage overhead of the blockchain by compressing the block after it is generated and committed to the blockchain. Kim et al. [13][54] proposed SELCOM, a selective compression scheme using a Block Merkle Tree, for lightweight nodes in blockchain systems. As shown in Figure 24, SELCOM allows nodes to maintain blocks selectively through a second chain called a checkpoint chain. It uses BMT to compress several blocks into a checkpoint. The compressed blocks can then be selectively removed or maintained depending on each node. Their results indicated an average storage reduction of 76.02%. The maintenance of a second chain introduces more complexity, as synchronization between peers for this chain is required. Unlike other works, the scautholars proposed an update mechanism to reduce the accumulation of compression results over time. While SELCOM can be used to verify numerous blocks with fewer compression results, the security of such an approach was not explored. Since IIoT systems have long been plagued with security concerns, the ability of lightweight nodes to selectively maintain blocks raises concerns, since it may be also be easier to have malicious nodes on the network. To improve the security of such sidechains, research should be undertaken to explore the use of further cryptographic proofs [19][60].
Figure 24. SELCOM [13]
Transaction flexibility
MOF-BC 25% - max 6.5 min High transaction processing timeHigh transaction processing time
Nartey et al. [23]Nartey et al. [53]
Pyoung et al. [26]Pyoung et al. [36]Cloud storage AT-MOPSO - Transaction flexibility LitiChain384.2 s Relatively poor solution for local space occupancy compared to NSGA-C
Zheng et al. [24]Zheng et al. [63] Distributed data storage IPFS-based storage 91.83% - Increased latency due to queries to IPFS network
Yu et al.
[
28]Yu et al. [68] Partial storage VBG
[
20
]
Palai et al. [
61] Summarization Recursive summarization tree 54% Huge block summary size

2.3.2. On-Chain Storage

The immutability of the blockchain ledger has a great appeal for organizations that intend to integrate this technology into their operations. However, this feature of the blockchain is a factor contributing to its storage inefficiency for systems such as IIoT. One of the interesting ideas that arose to combat this is providing flexibility when it comes to the generation of transactions. Table 4 5 shows a comparison of these works.
Table 45. Comparison of on-chain storage scheme optimization works.
 Comparison of on-chain storage scheme optimization works.
Proposed Work
Average storage of 100%–142% of baseline storage - - Undermines traceability and integrity of blockchain through unrecorded hashes of deleted transactions and blocks; high retention cost; complexity in determining expiry time of blocksUndermines traceability and integrity of blockchain through unrecorded hashes of deleted transactions and blocks; high retention cost; complexity in determining expiry time of blocks
Qi et al. [27]Qi et al. [67] Partial storage BFT-Store 86.8% - -
- 0.19 s - Increased query cost on remote block dataIncreased query cost on remote block data
Xu et al. [29]Xu et al. [69] Partial storage Consensus Unit 75%–95% Increased query cost 3% higher than benchmark High latency on off-node queriesHigh latency on off-node queries
Matzutt et al. [30]Matzutt et al. [70] Block pruning CoinPrune 86.98% - - Limited by UTXO-based designLimited by UTXO-based design
Wang et al. [31]Wang et al. [71] Block pruning ESS 82.14% - 9.21 s Limited by UTXO-based designLimited by UTXO-based design
Nadiya et al. [21]Nadiya et al. [62] Summarization Recursive summarization tree and deflate compression algorithm 78.1% Designed for bitcoin blockchain, lack of standard summary block for other blockchains

2.3. Storage Scheme Optimization

Another approach to improving the storage efficiency of blockchain systems is to improve or change the storage schemes of these systems. Generally, there are two ways in which blockchain data are stored; these are on-chain, where all blockchain data are either fully or partially stored by the blockchain peers, and off-chain, which introduces technologies such as cloud computing and secure distributed file storage to alleviate the storage burden on the blockchain peers.