Download Android App


Alternate Blog View: Timeslide Sidebar Magazine

Sunday, December 4, 2011

NoSQL - Right tool for the right job!

First of all, the name: NoSQL is not "Never SQL" or "No To SQL". It is generally expanded to "Not only SQL".

There are a crazy number of softwares out there that are part of the 'NoSQL' movement. Some of the popular ones are listed below:

Tokyo Cabinet/Tyrant, Redis, Voldemort, Oracle BDB
CouchDB, MongoDb
Neo4J, InfoGrid, Infinite Graph
Cassandra, HBase, Riak
Oracle Coherence, db4o, ObjectStore, GemStone, Polar

NoSQL softwares can be categorized broadly into 4 categories:
  1. Key-value stores
    • Based on Amazon's dynamo paper
    • Collection of key-value pairs
    • Example: Voldemort,  Tokyo Cabinet
  2. BigTable clones
    • Based on Google's BigTable paper.
    • Big table, column families
    • Example: HBase, Hypertable,  Cassandra
  3. Document database
    • Inspired by Lotus Notes
    • Collection of key-value pairs
    • Example: MongoDB, CouchDB
  4. Graph Databases
    • Inspired by Euler and graph theory
    • Nodes, relations and key-value on both.
    • Neo4j, Sones, Allegrograph
NoSQL is primarily about scalability. Generally, when we talk about scalability, we think about the data size - from gigabytes to pentabytes and maybe more. There is another dimension to scalability - complexity of data. The two axes to scalability: Size and complexity.

Scaling size: Dealing with more and more information that is roughly similar in nature.
Scaling complexity: How do you deal with data that is more messy and more semi-structure. The categories above are mapped in the graph below.


As you can see, the focus of these softwares are very different.
Key-value stores: Simplistic data model that can massively scale.
Graph database: Capable of handling complex connected data; difficult to achieve horizontal scalability.

Typical applications:

Key-value store: Storing the customer shopping cart of a customer in an E-Commerce application. Since all lookups are based on key (customer id), and in most cases, ad-hoc query is not needed. This makes key-value an ideal storage for shopping cart data.

BigTable clones: You have a news site where any piece of content: articles, comments, author profiles, can be voted on and an optional comment supplied on the vote. You create one store per user and one store per piece of content, using a UUID as the key (generating one for each piece of content and user). The user's store holds every vote they have ever made while the content "bucket" contains a copy of every vote that has been made on the piece of content. Overnight you run a batch job to identify content that users have voted on, you generate a list of content for each user that has high votes but which they have not voted on. You then push this list of recommended articles into the user's "bucket".

Document database: Documents are semi-structured data. Modeling product attributes using RDBMS has always been a challenge. All products will have a common set of attributes e.g. title, sku, price and a category specific set of attributes. Document databases effortlessly solve this problem. MongoDB is one step away from MySQL providing ad-hoc queries, transactions, indexes and more.

Graph database: Great for cases where data is connected e.g. social applications like twitter, facebook and blogs.

4 comments:

  1. Admiring the persistence you put into your site and in depth information you provide. It's good to come across a blog every once in a while that isn't the same outdated rehashed information. Wonderful read! I've bookmarked your site and I'm including your RSS feeds to my Google account.

    ReplyDelete
  2. This is good. Would like to see more on this.

    Rahul

    ReplyDelete
  3. This is good. Would like tos ee more on this.

    ReplyDelete
  4. This was precisely the answers I'd been searching for. Amazing blog. Incredibly inspirational! Your posts are so helpful and detailed. The links you feature are also very useful too. Thanks a lot :)

    ReplyDelete