Storage Systems in Edge Computing Infrastructures

Storage Systems in Edge Computing Infrastructures: Comparison

Please note this is a comparison between Version 1 by Antonios Makris and Version 2 by Sirius Huang.

Edge computing constitutes a promising paradigm of managing and processing the massive amounts of data generated by Internet of Things (IoT) devices. Data and computation are moved closer to the client, thus enabling latency- and bandwidth-sensitive applications. However, the distributed and heterogeneous nature of the edge as well as its limited resource capabilities pose several challenges in implementing or choosing an efficient edge-enabled storage system. Therefore, it is imperative for the research community to contribute to the clarification of the purposes and highlight the advantages and disadvantages of various edge-enabled storage systems.

blockchain
object storage
file storage
databases
edge computing
secure storage

1. Introduction

The amount of data generated by Internet of Things (IoT) devices is expected to grow dramatically in the future. According to Cisco [1], there will be almost 30 billion devices connected to the network by the end of 2023. Therefore, existing infrastructures will not be able to support, manage, and process such massive amounts of data. In fact, the current cloud infrastructure alone cannot support a large number of the current IoT applications as end devices are usually distant from the cloud servers, thus adding processing and network overhead, resulting in high latency, low bandwidth, and overall performance degradation.

A conceptual approach which combines the benefits of the cloud and the decentralized processing of services on edge devices is known as edge computing. Edge computing is a promising paradigm able to avoid network bottlenecks, overcome communication overheads, and reduce the data transfer delay ^{[2][3][4][5][6][7][8][9]}[2,3,4,5,6,7,8,9], as the computational load is moved to the edge of the network, thus leveraging the computational capabilities of the edge nodes. Resource-rich computational resources are placed closer to mobile or IoT devices [10] and therefore edge computing offers higher scalability and availability than traditional cloud platforms ^[11][12][11,12]. Over the years, several edge architectures have been proposed to improve throughput, latency, and network coverage ^[13][14][13,14]. In order to realize the cloud/edge integration, various technologies from different domains should be combined, including computing, network, and application-oriented fragments [15].

One of the main challenges in the development of applications at the edge is the efficient data sharing between the edge nodes, and it can be accomplished within individual application frameworks or through an external storage service. Despite significant improvements in offering an efficient edge storage solution, there are still some issues to be addressed related to the functional and non-functional requirements of cloud/edge-based applications, including low data retrieval latency, high availability and integrity, dealing with a potential shortage of storage resources at an edge node, supporting rapid application component deployment or automatic restart/replacement of unresponsive components, and dealing with the high heterogeneity presented in edge environments. These requirements can be achieved by optimizing resource usage, allocation, and data management plans on edge devices. Hence, the edge storage needs to provide a reliable, fast, stable, and secure shared storage engine, and because it is designed for edge devices with limited resource capabilities, it needs to be extremely lightweight.

2. Storage Systems in Edge Computing Infrastructures

The Internet of Things and WEB 4.0 are quickly becoming more dominant in more and more domains and daily life or industrial applications. This gives rise to a series of new challenges and problems that researchers are actively trying to tackle, both in the cloud ^[16][17] and in the edge ^[17][18][18,19]. One of the major problems that falls in this category is the minimization of data latency and network overload in fog or edge networks ^[19][20]. One of the most common solutions for this problem is the development of edge storage methodologies in order to move all or part of the necessary data and their processing to the edge, near the edge devices that use them. Edge storage services are actively focusing on decentralization and resource efficiency due to the nature of the edge networks and the devices that are taking part in them. These two main goals are driving the current research in the field. A plethora of traditional technologies in storage are being adapted in order to fit these two requirements, such as the blockchain and block storage technologies. Blockchain ^[20][21] is the well-known technology that came into existence in order to support Bitcoin, but since then, blockchain has developed a “life” of its own, being used in a great deal of other use cases and domains. Blockchain works by creating a central repository of transactions in the form of chained exchanges. Each of these exchanges must be validated by a number of peers in order to be registered in this central repository and be considered a valid transaction. When being applied in edge storage, blockchain has two major flaws: it needs heavy computational power to perform the transaction validations and it requires a centralized database in order to store the chain of transactions ^[21][22]. These two characteristics are causing direct conflict with the decentralization and low resource demand requirements of edge storage services. That is the reason that many researchers are trying to combine it with other technologies, such as peer-to-peer networks, limiting or even completely countering these flaws. Peer-to-peer (P2P) networks are a form of file storage and file sharing technology that is fully decentralized. These types of networks are using a set of protocols that ensure the safe and secure communication between the interconnected devices, called “peers” ^[22][23]. These protocols are usually lightweight, adding only minimal overheads to the actual data that peers are exchanging between themselves ^[23][24]. Modern peer-to-peer networks are using distributed hash tables (DHT) in order to enhance their functionality and security, some of them even integrating encryption algorithms in order to protect their data from a wider set of possible attacks ^[24][25]. The problem with these networks is in the integrity, immutability, and reliability because they provide no adequate security controls over these factors ^[25][26]. This limitation is forcing researchers to combine them with other, more secure technologies, such as blockchain, which provide the missing controls. The literature is actively trying to find a balance between the available frameworks by comparing their throughput, resource efficiency, and limitations, either on their own or when combined with each other. Blockchain and P2P networks are widely used for this purpose because peer-to-peer networks seem the ideal candidate for edge storage solutions, if the drawbacks already mentioned can be tackled. In relevant experiments, the interaction between these two frameworks seems to provide an efficient solution to the edge storage problem because blockchain can cover almost all of the weaknesses that P2P networks possess without adding much overhead, both in read/write operations, the throughput, and the network traffic ^{[21][26][27][28]}[22,27,28,29]. The only drawback is that blockchain mechanisms require more redundancy than P2P, which requires more available disk space in the edge clusters that host these solutions, placing limitations on the network architecture options for IoT and fog networks. Depending on the priorities of the researchers, two of the most important fields of interest in the relevant literature regarding edge storage architectures are security and resource efficiency ^[29][30]. In most of the cases, these two priorities are in direct conflict, because in order to improve the resource efficiency, some security rules need to be relaxed, and in order to improve the security, more resources need to be committed. For example, in systems that are based on blockchain and cryptographic security controllers, a great deal of middleware and network orchestrators are needed, allowing the framework to perform the necessary encryptions, decryptions, and security checks on each data transaction ^[30][31][31,32]. Some of the work performed in secure edge storage architectures prioritizes a different set of data security goals, such as availability and integrity. These approaches require a high redundancy which, again, is creating resource-demanding platforms ^[32][33][34][33,34,35]. Both erasure coding and data replication, which are the most common methodologies for ensuring availability and integrity, require additional nodes that are tasked with holding the replicated data and coordinating the data reading and recovery efforts. On the other hand, the systems that focus on high resource efficiency are usually bypassing data security altogether, focusing only on the data transfer and storage between the nodes, not taking into account the resources needed to secure the data packets transferred through the internet or the communication links between the nodes of the edge network ^[35][36][37][36,37,38]. These networks are often designed and evaluated with the assumption that data and network security are handled in another level of the data transfer and storage that is just out of their scope. Despite the fact that security is a major issue in every IoT system, cyber-risk regulations and assessment are still in their infancy. For that reason, the authors in ^[38][39] presented an analysis of cyber-risk assessment approaches in complex IoT systems and developed an epistemological analysis that enables the assessment of uncontrollable risk states in such systems. The performance of IoT active devices can be improved by sharing their communication and computation resources. However, most works in the literature focus on either communication cooperation or computation cooperation. In ^[39][40], the authors proposed an energy-efficient resource allocation scheme in a wireless-powered MEC system, by leveraging a joint communication/computation cooperation among users. This joint strategy has been proven to reduce overall energy consumption compared to other state-of-the-art works. As far as QoS is concerned, it is difficult for users to select the services with the highest quality. Over the years, many studies have been conducted for QoS prediction in edge computing environments. In ^[40][41], the authors proposed a QoS prediction approach by employing and extending the ARIMA model. Finally, in ^[41][42], Vehicular Edge Computing (VEC) is presented as a mechanism for improving the QoS, where a volunteer-assisted model is utilized for computation offloading.