Graph Databases: Comparison
Please note this is a comparison between Version 2 by Catherine Yang and Version 1 by Jorge Bernardino.

NoSQL databases were created with the primary goal of addressing the shortcomings in the efficiency of relational databases, and can be of four types: document, column, key-value, and graph databases. Graph databases are designed with a different way of representing and storing data, where data is defined as a collection of nodes and edges. Graph databases can store data and relationships efficiently, and have a flexible and easy to understand data schema. 

  • graph databases
  • NoSQL databases
  • open-source tools

1. Introduction

The development of the Internet and advances in technology have led to an increase in the volume and interconnectedness of data, making it a priority to rethink the storage, connectivity, availability, security, and response time of data queries. These advances have made it possible to intensify communication between people, with the most significant technological revolution undoubtedly taking place in social networks and online platforms. The high volume of activity on social networks also means a significant increase in data and the time required to manage and store it in a database.
The most common databases are relational databases, which are based on the relational data model. Although relational databases are the most commonly used databases, their use becomes more difficult in cases where we want to store large amounts of data. In relational databases, it is essential to avoid data loss and inconsistency to ensure data integrity [1]. To overcome some of these shortcomings, a new type of database called NoSQL has emerged. These NoSQL databases can be of four types: document, column, key-value, and graph.
Graph databases have been developed with a different way of representing and storing their data. The data is defined as a graph, a collection of nodes (vertices) and edges. There are several graph databases that we can use. Graphs are becoming the preferred choice as a data model for representing and storing this new type of data. 
The DB-Engines ranking is an independent data analysis initiative that provides information on database management systems, and its main product is a monthly ranking of database popularity based on several factors, including enterprise and developer adoption, online popularity, features offered, performance, scalability, community support, and feedback from expert database users. A weighted combination of these criteria is used to determine the overall DB-Engines ranking. Based on this ranking, wthe researchers selected the following four graph databases: JanusGraph, Nebula Graph, Neo4j, and TigerGraph.
Graph databases have been increasingly studied, mainly because they are a common NoSQL database used in various domains. However, few studies have evaluated the performance and compared the features of popular graph databases.
 

2. Characteristics of Graph Databases

Graph databases are a type of NoSQL database in which data is represented as graphs with nodes and edges. Nodes represent entities in a database and can have associated properties and labels. A graph database is based on a graph data model of properties or labels that provide internal structures with nodes and edges [2]. This type of graph provides additional features to make the graph easier to understand, where nodes can have one or more labels, and relationships between nodes can contain properties. The edges represent relationships between different nodes, and a relationship is a link between a source node and a destination node. Graph databases are characterized by user interactivity, ease of understanding, and flexibility of the data schema. This database has gained significant notoriety in several fields, namely biology, chemistry, fraud detection, and military research, but especially in social networks. The three main characteristics of graph databases are:
  • Performance: In relational databases, queries run on them become slower as the number and depth of relationships increase. In graph databases, however, performance does not change drastically as data grows.
  • Flexibility: The structure and schema of a graph model adapts itself to the needs of the application, and it is possible to add new data without compromising its functionality.
  • Agility: The continuous evolution of graph databases is consistent with today’s agile development practices, allowing the data to evolve as user needs change.

 

3. Popular Graph Databases

The following sections describe the main features of four popular graph databases, based on the DB-Engines ranking, and present their main strengths and weaknesses.

3.1. JanusGraph

JanusGraph was released in 2017 and is an open-source graph database. According to Qiao [3], JanusGraph is a distributed graph database based on another graph database called Titan. It is also based on Tinker-Pop, an open-source framework for graph databases and graph analytics systems. This framework allows programmers to add graph computing capabilities to their applications without having to worry about developing APIs, graph processing engines, or algorithms. Scalability is a critical feature of graph databases. JanusGraph is scalable and can store and query large graphs distributed over a cluster of multiple machines [4]. According to the documentation available on the JanusGraph website, one can see that the database supports third-party applications for data storage and indexing. For data storage, it supports Apache Cassandra, Apache HBase, and Oracle Berkeley DB Java Edition. The indexes allow more complex queries and use Elasticsearch, Apache Solr, and Apache Lucene. It supports the Java programming language and uses the Gremlin query language. JanusGraph has the following characteristics [5]:
  • Elastic and linear scalability as database and user numbers grow;
  • Support for a variety of storage and indexing back-ends.
The main advantages of JanusGraph are [5]:
  • High availability from multiple data centers;
  • Dynamic backups.
There are some limitations [5]:
  • It has a variety of storage and indexing back-ends, which makes JanusGraph dependent on third parties;
  • It is difficult to predict its future development.
Figure 1 shows the visualization of the graph database using Gremlin-Visualizer, an interactive graph visualization tool for graph databases, which uses Gremlin query language.
Figure 1.
Data visualization in Gremlin-Visualizer.

3.2. Nebula Graph

Nebula Graph was released by Vesoft Inc. as a native database specialized in storing various graph connections and retrieving information from them. It also stores nodes and edges in graphs with properties and labels. As a query language, it uses nGQL (nebula graph query language), which is similar to SQL, and designed for programmers and ordinary users. Nebula Graph supports C++, Java, Python, and Go programming languages. The main features of Nebula Graph are as follows [6]:
  • Ability to host graphs with hundreds of billions of nodes and trillions of edges;
  • Fast queries with a millisecond latency.
The main advantages can be highlighted as follows:
  • Due to its three-part architecture, it brings benefits such as high availability and cost effectiveness through higher resource utilization;
  • As an SSD-based product, it is better suited to future hardware trends and easier to achieve balanced read/write than HDD and large memory products.
However, it also has some limitations [6]:
  • If we back up a part of a particular graph in cluster A, the backup files cannot be restored in another cluster B;
Neo4j is an open-source Java graph database released in 2007. It uses a persistent Java engine that allows graph structures to be stored instead of tables. It is also one of the most widely used graph databases in several domains, such as healthcare, government, automated manufacturing, military, and others [7]. The query language is Cypher, which is inspired by SQL, and allows us to focus on the data we want from the graph [8]. In DB-Engines’ 2023 ranking, Neo4j is the first ranked database that stands out for its features [8]:
  • It takes up a relatively large amount of disk space.
Nebula Graph Studio is used for data representation. This is a browser visualization tool that provides an interactive user experience. It also allows manipulation of the data schema, import of data, and execution of nGQL statements for possible retrieval. Figure 2 shows the data representation in Nebula Graph Studio.
Figure 2.
Data visualization in Nebula Graph Studio.

3.3. Neo4j

  • Scalable database optimized for storing and querying large graphs distributed in a cluster on multiple machines;
  • Flexible and suitable for handling data with unstructured formats;
  • It has an easy to understand query language.
Its main advantages are [8]:
  • It has a high performance distributed cluster architecture;
  • It does not directly accept RDF formatted data;
  • Graph fragmentation minimizes query latency;
  • It consumes a large amount of memory.
    It has a cloud service called AuraDB and is fully managed with automatic updates and backups.
Neo4j also has some limitations, namely [8]:
The Neo4j interface is interactive, intuitive, and easy to use, as shown in Figure 3.
Figure 3.
The Neo4j interface.

3.4. TigerGraph

TigerGraph is an open-source graph database that was released in 2017. It is a system designed to perform multiple computations simultaneously based on parallelism [9]. It is also a distributed database capable of analyzing web-scale data in real time. Its query language is GSQL, and as the name suggests, it is a direct extension of SQL intended for graph databases, and it enforces strict declaration. TigerGraph uses C++ as its programming language. TigerGraph has the following features [10]:
  • A database capable of handling graph grids and a workload in a natural production environment where tens of terabytes of data are connected and constantly updated;
  • It has a high availability cluster that uses replication to provide continuous service when one or more servers are unavailable, or some service components fail.
The main advantages of TigerGraph are as follows [10]:
  • In terms of elasticity, users often do not know what hardware or computing power they will need. Elasticity eliminates the need to plan for the capacity;
  • It includes a flexible, high-performance data loader that can transfer tabular or semi-structured data while online.
On the other hand, it has some limitations [10]:
  • It only runs on Linux servers;
  • It has an expensive cloud service;
  • It has an expensive annual subscription price for the storage it provides.
TigerGraph’s interface is also very intuitive, as shown in Figure 4.
Figure 4.
TigerGraph Studio.

References

  1. Altin, R.; Kinaci, A.C. Analyzing the Encountered Problems and Possible Solutions of Converting Relational Databases to Graph Databases. J. Adv. Res. Nat. Appl. Sci. 2022, 8, 281–292.
  2. Sharma, C. Design of Formal Query Languages and Schemas for Graph Databases. Ph.D. Thesis, University of Auckland, Auckland, New Zealand, 2021.
  3. Qiao, J. Intelligent Big Data Framework for the Technical Design of Public Management Applications in Sports. Math. Probl. Eng. 2022, 2022, 1900548.
  4. Falcão, T.A.; Furtado, P.M.; Queiroz, J.S.; Matos, P.J.; Antunes, T.F.; Carvalho, F.S.; Fonseca, P.C.; Giuntini, F.T. Comparative Analysis of Graph Databases for Git Data. J. Phys. Conf. Ser. 1944, 1, 012004.
  5. JanusGraph. 2022. Available online: https://janusgraph.org/ (accessed on 6 March 2022).
  6. Nebula Graph. 2022. Available online: https://nebula-graph.io/ (accessed on 6 March 2022).
  7. Guia, J.; Soares, V.G.; Bernardino, J. Graph databases: Neo4j Analysis. In Proceedings of the 19th International Conference on Enterprise Information Systems, Porto, Portugal, 26–29 April 2017; Volume 1, pp. 351–356.
  8. Neo4j. 2022. Available online: https://neo4j.com/ (accessed on 6 March 2022).
  9. Deutsch, A.; Xu, Y.; Wu, M.; Lee, V. Tigergraph: A native MPP graph database. arXiv 2019, arXiv:1901.08248.
  10. TigerGraph. 2022. Available online: https://www.tigergraph.com/ (accessed on 6 March 2022).
More
Video Production Service