News from the non-relational multiverse

So far when dealing with hu-Hu-HUGE networks, the data cannot be processed in the memory of a single machine. Usually, we store the network in database tables (or similar, but worse: excel spreadsheets) describing the nodes and edges. Then you have to implement the graph-algorithm of your choice in this framework, which usually is leads to sub-optimal performance (putting it mildly). Straightforward optimizations would be for example in adressing a single node, the database could already load the adjacent edges into memory (cache) so the immediate next steps do not require additional access to the the disk-drive. Also, you might want to distribute parts of the network across several machines. Of course a carefully handcrafted and optimized object-relational mapping with tuned indices can do little wonders when you get it right, but the nagging thought remains that this can – and has to! – be dealt with in a better way. By now not only bioinformaticians and google-employees feel the occasional need to crunch BIG GRAPHS.

So the idea of Not-Only-SQL Databases has definitely been gaining appeal as well as momentum in the last two years also outside academia. Knowing SQL in it’s various incarnations for decades now, the (self-proclaimed) “Ultimate Guide to the Non – Relational Universe” has over a dozen different entries in the category “Graph Databases” alone.

Even when leaving interesting options from other categories (like Apaches’ Hadoop or the MongoDB) out the picture for the moment, getting an overview is daunting. It’s a jungle out there, and you don’t want to get bitten (at least not by samfink dat’s gonna kill ya) … But don’t look for definite advise to me! I’m just starting my tour and am getting my bearings right, cautiously dabbling around, and probably proceeding in the direction where the least number of beasts and dead-ends seem to be lurking. You know that (like in the classic tale of downgrading from wife1.0 to girlfriend2.0) once you commit to any particular system, it might be a very painful process later to change course. So here are some of the useful pointers I found:

First I came across, which has a nice media page with a few videos, among them “Understanding Graph Databases – with Darren Wood”.

Alex Popescu has a great blog on all related topics, for example on “Graphs & Neo4J” and a “Quick Review of Existing Graph Databases“. I just checked Pere Urbón’s comparison of Graph Databases.

From what I see at the moment, Neo4j is supporting most of my favourite languages (like java and python), it’s an open source project with “a high-performance graph engine with all the features of a mature and robust database” (they say). And also it looks like they have the manpower and commercial userbase to see this through in the long run. They quote Werner Vogels, CTO of Amazon, with “For anything with multiple relationships, multiple connections, Neo4j absolutely ROCKS!”. Seems that’s the beast I am going to check out first, but if you read this and got bitten before – and survived 😉 – let us know!

select fun, profit from real_world where relational=false;


, , , , ,

  1. #1 by cistronic on 2011/12/22 - 10:15

    see also
    “Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vs HBase vs Membase vs Neo4j comparison”
    by Kristóf Kovács at

    found via Pablo Pareja Tobes post on linkedIn

  2. #2 by cistronic on 2012/03/21 - 16:18

    not entirely serious, I suppose, and quite to the point while fun to read:
    NoSQL No More: Let’s double down with MoreSQL

  1. More news from the non-relational multiverse « cistronic

further hints, constructive criticism, questions, praise

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: