3/5/2016 - NoSQL and RDBMS conjecture

Disclaimer: This is really about relational and document database paradigms and is not encompassing of all the types of databases that exist. i.e. Graph databases

The Tangent

I always had these nagging issues with MongoDB that I couldn't fully articulate or even really actualize as being problems. Something was always off kilter for how I wanted to use Mongo. This post Why you should never use MongoDB by Sarah Mei perfectly describes a few of those problems. The problem being that the use cases appropriate for Mongo are very limited. Even to the point that Mongo itself can't admit it. They boast how freeing it is not to be constrained by rigid table schema (which is partially true), and downplay how our brains naturally perceive information is by connections. My conjecture is the point of entry to getting to the information we want is hardly ever a linear hierarchical structure. We naturally gravitate towards making new connections for things we already know.

More to the point a schema doesn't change a whole lot once it's put into practice, and having disciplines and patterns to follow actually help you in the design phase. It's a trivial task for me to basically waterfall design out a table. Conceptually I list out the types of columns I'm going to need, and if later I need to add more its fairly trivial. Whereas how we connect those pieces of data is hard to fully understand, and is likely to change in the lifespan of the application. A Relation Database or RBDBMS excel at solving these types of problems. A document store flips the burden of development on its head. For a document store the assumption is made that the data your storing is ever changing very hard to conceptualize, and the connection you make to get to it is very linear and easy to describe.

To simplify: a relational database is a kitchen appliance with a thousand uses, and a document store is one with a single use. Sadly MongoDB pretends that it has more than one use, and is less than truthful about what it's limitations are. Really NoSQL is a buzzword acronym that terribly describes what those databases are trying to solve. It's wonderful in a way, because NoSQL is saying if you have a problem thinking about data in its individual parts than use this as your tool. However in reality if you can accept having a rigid playground that your data sits in than maybe SQL is ok after-all.

Mongodb is Webscale


Mongo specific problems

Updating parts of a document is a bug

What I mean by that is you can write a query to find a specific subset of information in a collection of documents. i.e. Match the value of a field inside of an object inside of an array thats in a document. Then to update the same information you have to completely overwrite the entirety of those documents; you can't modify specific part you cared about. This is explicitly a bug for Mongo link, and this ticket was created March of 2010. When I hit this issue it was an unexpected surprise and if I knew about this issue upfront it would have saved me a lot of wasted time. It also seems like an arbitrary limitation because you can update parts of documents as long as they don't meet certain conditions. Nesting being one of them. I realize that in reality nothing is ever simple, but if I can create a query and get back the expected results I should also be able to update those results.

Data Duplication

If your familiar with relational databases, the motivation is to have one source of truth. Duplication of data is bad because if the source of truth changes all of the copies have to be updated as well. This creates a syncing problem. What happens if two copies are updated to different values, which version wins during the sync operation? For a document store data duplication is something you have to be comfortable with. Being able to perform a join between collections wasn't possible before Mongo 3.2 - link. And what they do have for join support is limited to one type, left outer equals joins. In relational databases there are 8 types of joins link.

Hype

To be frank a lot of what MongoDB has going for it amounts to hype. I'm really not sold that it's the best JSON document store database, and on the other hand it seems to be one of the most popular. It may become better but the featureset and what you can do with it has be more robust before I'll be sold on it.


What is NoSQL?

Introduction to NoSQL - Martin Fowler

It's telling that the first 40 minutes is a disclaimer about NoSQL and it's trade-offs.


Crafting the Black Arts

What if we could take a document store and put it into a relational database? Also what if we could keep the same query api that MongoDB has? well that's exactly what ToroDB does.

That is a scary proposition. It's basically an Object Relational Model that maps JSON objects to non human readable tables. My gripe with that is if your project becomes dependent upon ToroDB you'll have this frankenstein beast of a database that is dependent on a lot of moving parts. It has a dependency for keeping up with MongoDB apis, with prostgres' feature set, and with this ToroDB as the glue layer. The advantage being that you get your cake and eat it too. my argument would be that if PostgreSQL wants to have a subset of the features that Mongo offers they will most likely add it themselves at some point, and maybe than that's a better time to revisit using PostgreSQL as JSON a document store.


In summary

  • Document stores are suited for scale
  • Document stores are suited for read operations
  • Linking sub-parts of a document is a pain
  • Consistency is hard, for anything non-trivial
  • Document stores fulfill domain specific use cases

Using a document store comes with many trade-offs and I don't think it's a great starting point for an application if you are not aware of them. My take away is for me to use a document store based database I need to learn a lot more.