
Review: The NoSQL Matrix
The macrocosm of NoSQL databases is a diverse one of which graph databases are only a part. Last week, we toured the three blue quadrants of the matrix below which are collectively known as aggregate stores, including key-value, column family and document stores. This week, we’ll be double-clicking on the equally diverse world of graph database technologies which occupy the green quadrant in the matrix below.The matrix of NoSQL databases. Quadrants in blue are collectively known as aggregate stores.
The Spectrum of Graph Database Technologies
We already walked through a formal definition of a graph database in our first post, but let’s do a quick review. A graph database is an online, operational database management system capable of Create, Read, Update, and Delete (tech lingo: CRUD) processes that operate on a graph data model. There are two important properties of graph database technologies:- Graph storage Some graph databases use “native” graph storage that is specifically designed to store and manage graphs, while others use relational or object-oriented databases which are often slower.
- Graph processing engine Native graph processing (tech lingo: index-free adjacency) is the most efficient means of processing data in a graph because connected nodes physically “point” to each other in the database. Non-native graph processing engines use other means to process CRUD operations.
Property Graphs
Property graphs are the type of graph database we’ve already talked about most. In fact, our original definition of a graph database was more precisely about a property graph. Here’s a quick recap of what makes a graph database a property graph:- Property graphs contains nodes (data entities) and relationships (data connections).
- Nodes can contain properties (tech lingo: key-value pairs).
- Nodes can be labeled with one or more labels.
- Relationships have both names and directions.
- Relationships always have a start node and an end node.
- Like nodes, relationships can also contain properties.
Hypergraphs
A hypergraph is a graph model in which a relationship (called a hyperedge) can connect any number of given nodes. While a property graph permits a relationship to have only one start node and one end node, the hypergraph model allows any number of nodes at either end of a relationship. Hypergraphs can be useful when your data includes a large number of many-to-many relationships. Let’s look at the example below. In this simple (directed) hypergraph, we see that Alice and Bob are the owners of three vehicles, but we can express this relationship using a single hyperedge. In a property graph, we would have to use six relationships to express the concept. In theory, hypergraphs should produce accurate, information-rich data models. However, in practice, it’s very easy for us to miss some detail while modeling. For example, let’s look at the figure below, which is the property graph equivalent of the hypergraph shown above. This property graph model requires severalOWNS
relationships to express what the hypergraph captured with just one hyperedge. Yet, by using six relationships instead of one, we have two distinct advantages:
- First, we’re using a more familiar and explicit data modeling technique (resulting in less confusion for a development team).
- Second, we can also fine-tune the model with properties such as “primary driver” (for insurance purposes), which is something we can’t do with a single hyperedge.
Triple Stores
Triple stores come from the Semantic Web movement and store data in a format known as a triple. Triples consist of a subject-predicate-object data structure. Using triples, we can capture facts such as “Ginger dances with Fred” and “Fred likes ice cream.” Individually, single triples aren’t very useful semantically, but en-masse, they provide a rich dataset from which to harvest knowledge and infer connections. Triple stores are modeled around the Resource Description Framework (RDF) specifications laid out by the W3C, using SPARQL as their query language. Data processed by triple stores tends to be logically linked, thus triple stores are included in the category of graph databases. However, triple stores are not “native” graph databases because they don’t support index-free adjacency, nor are their storage engines optimized for storing property graphs. Triple stores store triples as independent elements, which allows them to scale horizontally but prevents them from rapidly traversing relationships. In order to perform graph queries, triple stores must create connections from individual, independent facts – adding latency to every query. Because of these trade-offs in scale and latency, the most common use case for triple stores is offline analytics rather than for online transactions.Conclusion
Just like for other NoSQL databases, every type of graph database is best suited for a different function. Hypergraphs are a good fit for capturing meta-intent and RDF triple stores are proficient at offline analytics. But for online, transactional processing nothing beats a property graph for a rapid traversal of data connections. Learn more about the diverse world of graph database technologies: Click below to get your free copy of the O’Reilly Graph Databases ebook and discover how to apply graph technologies to mission-critical problems at your enterprise. Catch up with the rest of the Graph Databases for Beginners series:- Graph Databases for Beginners: Why Graphs Are the Future
- Graph Databases for Beginners: Why Data Relationships Matter
- Graph Databases for Beginners: The Basics of Data Modeling
- Graph Databases for Beginners: Data Modeling Pitfalls to Avoid
- Graph Databases for Beginners: Why a Database Query Language Matters
- Graph Databases for Beginners: Why We Need NoSQL Databases
- Graph Databases for Beginners: ACID vs. BASE Explained
- Graph Databases for Beginners: A Tour of Aggregate Stores
The post Graph Databases for Beginners: Other Graph Data Technologies appeared first on Neo4j Graph Database.