Data Modifications in Blockchain Architecture

Data Modifications in Blockchain Architecture: History

Please note this is an old version of this entry, which may differ significantly from the current revision.

Subjects: Computer Science, Software Engineering

Contributor:

Khikmatullo Tulkinbekov

Due to the immutability of blockchain, the integration with big-data systems creates limitations on redundancy, scalability, cost, and latency. Additionally, large amounts of invaluable data result in the waste of energy and storage resources. As a result, the demand for data deletion possibilities in blockchain has risen.

blockchain
IoT
big data
data modifications
selective deletion
edge computing

1. Introduction

In recent years, blockchain technology has emerged in several fields. Distributed and secure peer-to-peer (P2P) networks were first employed in the financial world, leading to the emergence of new cryptocurrencies and nonfungible token (NFT) exchanges, such as Bitcoin [1] and Ethereum [2]. These dramatic advancements in modern financial systems have attracted academic interest in the integration of blockchain into numerous other fields [3,4,5], including edge computing [6,7,8]. Edge computing enables a decentralized approach to Internet of Things (IoT) data processing to address the centralization limitations of cloud-based data centers. The edge nodes are located geographically close to the user plane, which allows localized data handling. Edge computing employs distributed edge nodes, making it a likely candidate for implementing blockchain protocols. Recent architectures, such as Recordchain [9] and Groupchain [10], have demonstrated the benefits of blockchain integration in edge-computing environments.

However, the data processing requirements of these two systems present challenges owing to technological mismatch. Edge-computing nodes typically handle IoT devices that generate big data through frequent data updates and deletion operations. However, a traditional blockchain requires all nodes to share the same database for reliability and immutability, thereby preventing data alteration upon insertion into the blockchain. Because all nodes share a single copy of blockchain data, the network can easily reject any malicious modification. This simple rule has been implemented in cryptocurrencies since 2009, enabling a secure money-exchange protocol without government interference. In 2014, a network called Ethereum [2] offered expanded blockchain capabilities with the introduction of a smart contract, an executable source code deployed in a blockchain, with execution on an Ethereum virtual machine (EVM). As a smart contract can include logic, it allows data modification. However, limitations in size and cost render smart contracts inapplicable to big-data processing.

Recently, academic attempts have been made to enable data modification in blockchain-based IoT environments. Although many state-of-the-art methods have been developed [11,12,13,14], no complete approach handles both big data and instant modification operations. Furthermore, most existing approaches fail to retain the security advantages of blockchains because the longest-chain rule is broken by deleting existing blocks. Most of these approaches require data with predefined lifetimes to enable deletions in the blockchain architecture.

2. Data Modification in the Blockchain

Because the original purpose of blockchain was to create a distributed and immutable database with high emphasis on data security, data-deletion operations were initially considered unnecessary. As the number of application fields has increased, many researchers have begun to consider the use of blockchain in big-data-oriented systems. Although data-deletion techniques have not been explicitly discussed in most related studies, these approaches have focused on enabling new research topics. This section discusses these state-of-the-art projects, most of which are related to data modification in the blockchain.

This research originated from Ethereum [1], which was introduced as a breakthrough in blockchain technology with the ability to store executable source codes. This has motivated developers to digitalize physical and virtual assets and store them in secure blockchain networks for secure ownership. In addition to their ability to store and execute logic, smart contracts have also been extended to enable data modification. By calling the correct logic, developers can update and delete the existing data within a smart contract. Motivated by smart contracts, projects such as Binance Smart Chains [15], polygons [16], and Solana [17] have been developed. However, the underlying blockchains continue to extend most of the consensus from Bitcoin, in which all databases must be broadcast on the network. This incurs limitations in terms of the latency and storage costs. To avoid these issues, Ethereum limited the size of each smart contract to 24 KB. Furthermore, the publication of a smart contract incurs a fee that will be given to the nodes as a reward for securing the network. In other words, these requirements have motivated developers to avoid excessively large smart contracts, owing to increased costs. Although this is sufficient for digitalizing valuable assets and enabling businesses to use new types of cryptocurrencies, it is not affordable in edge-computing environments. Hyperledger Fabric [18], on the other hand, introduces the method called “data pruning” and “private data collections”, which allows the data to have an expiration time and can be deleted later. Even though it seems like a solution for data modification issues, Hyperledger is only available as a private or permissioned network where the deployment and maintenance costs are affordable by only big companies and enterprises. Additionally, due to unpredictable network constraints, the exact data pruning solutions cannot be applied in public networks. As an alternative, Sayeed et al. [19] proposed a trustworthy and privacy-preserving framework named TRUSTEE, integrating Hyperledger Fabric, IPFS, and the latest encryption techniques for all data operations. Nevertheless, the usage of permissioned networks makes it impractical in public environments. IPFS is also widely used to integrate different blockchains as data storage [20]. For example, de Brito Gonçalves et al. [21] proposed IoT data storage on IPFS with integration of Ethereum smart contracts. This approach can be promising for precious data management, but integrating smart contracts makes it not affordable in typical big-data systems.

Another deletion-oriented method was introduced by Yang et al. [22] who developed a blockchain-based deletion technique for cloud storage. The authors enhanced the cloud server honesty by employing blockchain network verification for data modification. However, this technique does not fully focus on deleting blockchain data, but on securing deletion operations using blockchain. Zhu and Kouhizadeh [23] employed blockchain because of its traceability features in supply-chain systems where redundant data deletions are common. Here, a blockchain is implemented to avoid unintended product deletions and recover deleted data using traceability features. Bosona et al. [24] and Li et al. [25] also worked on the traceability and access management solutions on supply chain using the blockchain technology. El Khanboubi et al. [26] employed a blockchain protocol to enable the smart deletion of duplicated data. In this method, deduplication is easily verified using blockchain features, and automatic deletions are enabled for duplicate data. Li et al. [27] introduced another state-of-the-art approach for data auditing in cloud computing using a blockchain protocol that automates data management. Ra et al. [28] propose a blockchain-based XOR global-state injection method for content modification. Moreover, Kim et al. [29] proposed an evaluation model to measure the immutability in different blockchain technologies. However, all the aforementioned methods focus on employing blockchain as an additional feature to enable or monitor data modification in cloud-based systems. Another state-of-the-art approach for secure data management was introduced by Xu et al. [30] with the implementation of a distributed redactable blockchain free of third-party participation. Guo et al. [31] also proposed transaction redaction features in a policy-hidden manner using blockchain. Lu [32] and Valadares et al. [33] compiled a survey discussing current issues and research gaps pertaining to the use of blockchain in big-data systems and their privacy features.

Although data deletion from blockchain has not been a central topic in most previous studies, the adaptation of blockchain to big-data systems has been a research subject for a long time. Huang et al. [34] proposed the BlockSense architecture, which is a fully distributed approach to mobile crowd-sensing techniques using a proof-of-data consensus. The authors achieved promising results in terms of data privacy and performance compared with the Ethereum network. Taloba et al. [35] proposed a hybrid platform for multimedia data processing designed for IoT–healthcare systems to manage patient-related data. Heo et al. [36] proposed a storage optimization technique with the help of employing blockchain for distributed caching. Zhaofeng et al. [37] introduced a trusted data management system for edge computing. Umoren et al. [38] proposed decentralized storage for user authentication in fog computing. Kwak et al. [39] proposed a blockchain-based solar energy trading platform mainly applicable for a smart city environment. On the other hand, Lian et al. [40] took a different approach to the meaning of big data by designing a secure and trusted system for storing large transactions generated by international trading. The IoTA Research Papers [41] employed tangle and coordinator nodes to maintain consensus in microtransactions. Xu et al. [42] introduced a trustless crowd-sensing technique for mobile edge using blockchain. Li et al. [43], MEVerse PTE Ltd. [44], and Chia Network [45] represent group-based consensus approaches for blockchain protocols to handle the higher throughput inherent in IoT data systems. Although these techniques do not focus solely on data deletion, the applied environment theoretically provides new research directions for general data modifications. In parallel to blockchain integration with big-data-related systems, its security is always becoming an emerging topic. In their survey paper, Yassine et al. [46] discuss the possible challenges and practical applications of blockchain in cybersecurity and data privacy. Ali et al. [47] also listed the cutting-edge secrets of cyberphysical systems in consortium blockchain. Hameed et al. [48] made even more narrowed discussions addressing the blockchain-based industrial applications, their perspectives, and possible security threads.

Summarizing, Table 1 compares the related literature regarding data modification requirements in big-data systems. Since not all related literature describes data modification, only the closely related works have been selected for comparison. As the table shows, as the basic blockchain architecture, Bitcoin only has security-related requirements regarding big-data handling. Ethereum, on the other hand, offers more complex membership rules with light and full nodes. Also, the smart contracts enable the possibility of data updates. LiTichain uses a permissioned network for faster block verification. It offers a state-of-the-art solution for block deletions, direct updates are not provided, and the longest-chain rule is broken due to deleted blocks. Hillman et al. employ data deletions and updates on public networks. But still, the longest-chain rule needs to be preserved. Hyperledger Fabric is a famous blockchain solution that already offers data pruning. Due to its permissioned nature, Hyperledger does not follow the longest-chain rule and does not affect its security. However, it still has limitations in terms of instant deletions. Even if the data expires on Hyperledger Fabric, its removal is delayed until the data pruning occurs. Also, Hyperledger Fabric uses a private database for an expirable database and only stores the hash in the blockchain, which means the corresponding hash is never deleted. Kuperberg et al. and Sayeed et al. employ their solution based on Hyperledger Fabric, and there are many similarities, except that Kuperberg et al. do not employ off-chain data storage. Kanboubi et al., on the other hand, take a different approach by using blockchain to control data deletions on the central cloud. For this purpose, the authors employ a private blockchain with authorized entity participation. With this help, they can achieve an advantage on instant deletions, but this idea does not directly employ deletion inside the blockchain structure. Guo et al. also use a permissioned blockchain and achieve similar achievements to others. IOTA is also included in comparisons despite not employing data deletions. However, IOTA stands as one of the popular public blockchains that provides the highest transaction confirmation at low cost. The table shows that most successful solutions are based on the permissioned or private blockchain. The reason is that consensus is more accessible in consortium blockchain when the block structure and rules are changed. However, the same rules do not apply to public networks. For this reason, Unlichain stands as the complete solution that offers data modifications in the public blockchain network.

Table 1. Comparison of related literature.

Ref	Type	D ¹	U ²	OC ³	LCR ⁴	NC ⁵	IC ⁶	SD ⁷	MO ⁸
Bitcoin [1]	Public	X	X	X	✓	X	X	X	X
Ethereum [2]	Public	X	✓	X	✓	✓	X	X	X
LiTichain [11]	Permissioned	✓	X	✓	X	X	✓	X	X
Hillman et al. [12]	Public	✓	✓	✓	X	X	X	✓	X
Kuperberg et al. [13]	Permissioned	✓	✓	✓	X	✓	X	✓	X
Hyperledger Fabric [18]	Permissioned	✓	✓	X	X	✓	X	✓	X
Sayeed et al. [19]	Permissioned	✓	✓	X	X	✓	X	✓	X
Kanboubi et al. [26]	Private	✓	✓	X	✓	X	✓	✓	X
Guo et al. [31]	Permissioned	✓	✓	X	X	✓	✓	✓	X
IOTA [41]	Public	X	X	✓	✓	✓	✓	X	X
Unlichain (Proposed)	Public	✓	✓	✓	✓	✓	✓	✓	✓

¹ Delete, ² Update, ³ On-Chain data handling, ⁴ Longest-Chain Rule preservation, ⁵ Node Classification, ⁶ Instant Confirmation, ⁷ Selective Deletion, ⁸ Membership Optimization. ✓: feature is available; X: feature is not available.

This entry is adapted from the peer-reviewed paper 10.3390/s23218762

© Text is available under the terms and conditions of the Creative Commons Attribution (CC BY) license; additional terms may apply. By using this site, you agree to the Terms and Conditions and Privacy Policy.