Quantcast
Channel: nosql database Archives - Neo4j Graph Data Platform
Viewing all articles
Browse latest Browse all 43

Graph Databases for Beginners: Other Graph Data Technologies

$
0
0
Learn All about Other Graph Data Technologies in This Graph Databases for Beginners Series Whether you’re new to the world of graph databases or an old pro, it’s easy to assume there’s only a few types of graph database technologies. In reality, it’s one of the most diverse sectors of the NoSQL ecosystem. In this “Graph Databases for Beginners” blog series, I’ll take you through the basics of graph technology assuming you have little (or no) background in the space. In past weeks, we’ve tackled why graphs are the future, why data relationships matter, the basics (and pitfalls) of data modeling, why a query language matters, why we need NoSQL, the ACID vs BASE consistency models and the trade-offs of aggregate stores. This week, we’ll discuss the spectrum of graph database technologies and where they belong in the world of NoSQL.

Review: The NoSQL Matrix

The macrocosm of NoSQL databases is a diverse one of which graph databases are only a part. Last week, we toured the three blue quadrants of the matrix below which are collectively known as aggregate stores, including key-value, column family and document stores. This week, we’ll be double-clicking on the equally diverse world of graph database technologies which occupy the green quadrant in the matrix below.
The World of NoSQL Databases

The matrix of NoSQL databases. Quadrants in blue are collectively known as aggregate stores.

The Spectrum of Graph Database Technologies

We already walked through a formal definition of a graph database in our first post, but let’s do a quick review. A graph database is an online, operational database management system capable of Create, Read, Update, and Delete (tech lingo: CRUD) processes that operate on a graph data model. There are two important properties of graph database technologies:
    • Graph storage
    • Some graph databases use “native” graph storage that is specifically designed to store and manage graphs, while others use relational or object-oriented databases which are often slower.
    • Graph processing engine
    • Native graph processing (tech lingo: index-free adjacency) is the most efficient means of processing data in a graph because connected nodes physically “point” to each other in the database. Non-native graph processing engines use other means to process CRUD operations.
Besides specifics around storage and processing, graph databases also adopt distinct data models. The most common graph data models include property graphs, hypergraphs and triples. Let’s dive into each of these below.

Property Graphs

Property graphs are the type of graph database we’ve already talked about most. In fact, our original definition of a graph database was more precisely about a property graph. Here’s a quick recap of what makes a graph database a property graph:
    • Property graphs contains nodes (data entities) and relationships (data connections).
    • Nodes can contain properties (tech lingo: key-value pairs).
    • Nodes can be labeled with one or more labels.
    • Relationships have both names and directions.
    • Relationships always have a start node and an end node.
    • Like nodes, relationships can also contain properties.
(It’s worth noting that Neo4j is a property graph database.)

Hypergraphs

A hypergraph is a graph model in which a relationship (called a hyperedge) can connect any number of given nodes. While a property graph permits a relationship to have only one start node and one end node, the hypergraph model allows any number of nodes at either end of a relationship. Hypergraphs can be useful when your data includes a large number of many-to-many relationships. Let’s look at the example below.
A Hypergraph Data Model
In this simple (directed) hypergraph, we see that Alice and Bob are the owners of three vehicles, but we can express this relationship using a single hyperedge. In a property graph, we would have to use six relationships to express the concept. In theory, hypergraphs should produce accurate, information-rich data models. However, in practice, it’s very easy for us to miss some detail while modeling. For example, let’s look at the figure below, which is the property graph equivalent of the hypergraph shown above.
A Property Graph Data Model
This property graph model requires several OWNS relationships to express what the hypergraph captured with just one hyperedge. Yet, by using six relationships instead of one, we have two distinct advantages:
    1. First, we’re using a more familiar and explicit data modeling technique (resulting in less confusion for a development team).
    2. Second, we can also fine-tune the model with properties such as “primary driver” (for insurance purposes), which is something we can’t do with a single hyperedge.
Because hyperedges are multidimensional, hypergraph models are more generalized than property graphs. Yet, the two are isomorphic, so you can always represent a hypergraph as a property graph (albeit with more relationships and nodes). While property graphs are widely considered to have the best balance of pragmatism and modeling efficiency, hypergraphs show their particular strength in capturing meta-intent. For example, if you need to qualify one relationship with another (e.g., I like the fact that you liked that car), then hypergraphs typically require fewer primitives than property graphs. Whether a hypergraph or a property graph is best for you depends on your modeling mindset and the kinds of applications you’re building.

Triple Stores

Triple stores come from the Semantic Web movement and store data in a format known as a triple. Triples consist of a subject-predicate-object data structure. Using triples, we can capture facts such as “Ginger dances with Fred” and “Fred likes ice cream.” Individually, single triples aren’t very useful semantically, but en-masse, they provide a rich dataset from which to harvest knowledge and infer connections. Triple stores are modeled around the Resource Description Framework (RDF) specifications laid out by the W3C, using SPARQL as their query language. Data processed by triple stores tends to be logically linked, thus triple stores are included in the category of graph databases. However, triple stores are not “native” graph databases because they don’t support index-free adjacency, nor are their storage engines optimized for storing property graphs. Triple stores store triples as independent elements, which allows them to scale horizontally but prevents them from rapidly traversing relationships. In order to perform graph queries, triple stores must create connections from individual, independent facts – adding latency to every query. Because of these trade-offs in scale and latency, the most common use case for triple stores is offline analytics rather than for online transactions.

Conclusion

Just like for other NoSQL databases, every type of graph database is best suited for a different function. Hypergraphs are a good fit for capturing meta-intent and RDF triple stores are proficient at offline analytics. But for online, transactional processing nothing beats a property graph for a rapid traversal of data connections. Learn more about the diverse world of graph database technologies: Click below to get your free copy of the O’Reilly Graph Databases ebook and discover how to apply graph technologies to mission-critical problems at your enterprise. Catch up with the rest of the Graph Databases for Beginners series:

Top right image source

The post Graph Databases for Beginners: Other Graph Data Technologies appeared first on Neo4j Graph Database.


Viewing all articles
Browse latest Browse all 43

Trending Articles