As the problems with SQL-based relational databases have become
all to clear, there’s been a meteoric rise in the popularity of a new family of data storage technologies known as
NoSQL.
NoSQL is a cheeky acronym for
Not Only SQL – or more confrontationally –
No to SQL. But the term “NoSQL” only defines what these data stores are
not, rather than what they are.
In this “Graph Databases for Beginners” blog series, I’ll take you through the basics of graph technology assuming you have little (or no) background in the space. In past weeks, we’ve tackled
why graphs are the future,
why data relationships matter,
the basics (
and pitfalls) of
data modeling and
why a query language matters.
This week, we’ll discuss the many and motley world of NoSQL databases – and why they’ve become so popular.
The Diverse World of NoSQL Databases
NoSQL databases are a spectrum of data storage technologies that are more varied than they are similar so it’s difficult to make sweeping generalizations about their characteristics.
In the following weeks, we’ll explore a few types of NoSQL databases. Our tour will encompass the group collectively known as
aggregate stores (highlighted in blue below), including key-value stores, column family stores and document stores as well the various types of graph databases (in green), which include property graphs, hypergraphs and RDF triple stores.
An overview of the NoSQL database space. Quadrants in blue are collectively known as aggregate stores.
Historically, most enterprise-level web applications ran on top of a relational database (RDBMS). But in the past decade alone,
the data landscape has changed significantly and in a way that traditional RDBMS deployments simply can’t manage.
The NoSQL database movement has emerged particularly in response to three of these data challenges:
- Data Volume
- Data Velocity
- Data Variety
We’ll explore each of these challenges in further detail below.
Data Volume
It’s no surprise that as data storage has increased dramatically,
data volume (i.e., the size of stored data) has become the principal driver behind the enterprise adoption of NoSQL databases.
Large datasets simply become too unwieldy when stored in relational databases. In particular,
query execution times increase as the size of tables and the number of joins grow (so-called
join pain).
This isn’t always the fault of the relational databases themselves though. Rather, it has to do with
the underlying data model.
In order to avoid joins and join pain, the NoSQL world has several alternatives to the relational model. While these NoSQL data models are better at handling today’s larger datasets, most of them are simply not as expressive as the relational model. The only exception is the graph model, which is actually
more expressive. (More on that in the weeks to come.)
Data Velocity
But volume isn’t the only problem modern web-facing systems have to deal with. Besides being big, today’s data often changes very rapidly.
Thus,
data velocity (i.e. the rate at which data changes over time) is the next major challenge that NoSQL databases are designed to overcome.
Velocity is rarely a static metric. A lot of velocity measurements depend on the context of both internal and external changes to an application, some of which have considerable system-wide impact.
Coupled with high volume, variations in data velocity require a database to not only handle high levels of edits (tech lingo: write loads), but also deal with surging peaks of database activity. Relational databases simply aren’t prepared to handle a sustained level of write loads and can crash during peak activity if not properly tuned.
But there’s also another aspect of data velocity NoSQL technology helps us overcome: the rate at which the data
structure changes. In other words, it’s not just about the rapid change of specific data points but also the
rapid change of the data model.
Data structures commonly shift for two major reasons. First is the fast-moving nature of business. As your enterprise changes, so do your data needs.
Second is that data acquisition is often experimental. Sometimes your application captures certain data points just in case you might need them later on. The data that proves valuable to your business usually sticks around, but if it isn’t worthwhile, then those data points often fall by the wayside. Consequently, these experimental additions and eliminations affect your data model on a regular basis.
Both forms of data velocity are problematic for relational databases to handle. Frequently high write loads come with expensive processing costs, and regular data structure changes come with high operational costs.
NoSQL databases address both data velocity challenges by optimizing for high write loads and by having flexible data models.
Data Variety
The final challenge in today’s data landscape is
data variety – that is, it can be dense or sparse, connected or disconnected, regularly or irregularly structured.
Today’s data is far more varied than what relational databases were originally designed for. In fact, that’s why many of today’s RDBMS deployments have a number of nulls in their tables and null checks in their code – it’s all to adjust to today’s data variety.
On the other hand, NoSQL databases are designed from the bottom up to adjust for a wide diversity of data and flexibly address future data needs.
Conclusion
Relational databases can no longer handle today’s data volume, velocity and variety. Yet, understanding how NoSQL databases overcome these challenges is only the prelude of finding the right database for your enterprise.
In the coming weeks, we’ll explore the strengths and weaknesses of various NoSQL technologies so you can make the most informed decision possible.
Broaden your horizons on the world of graph technologies: Click below to get your free copy of the O’Reilly Graph Databases ebook and discover how to apply graph databases to mission-critical problems at your enterprise.
Catch up with the rest of the Graph Databases for Beginners series: