Posts Tagged Data
As said before, I am getting deeper into graph-databases, specifically “neo4J “. The pace of development is breathtaking, it’s hard to keep up with the new versions and amazing features. In preparation of attending a “Cypher Hands On” (Meetup-Graph), I finally got round to updating to the latest 1.8M03 Milestone. By now, there are a couple of nice introductory videos available:
You might want to check out the videoGraphy @ neo4J. I also recommend the following Intro to Graph Databases (on vimeo) which has a nice explanation on what the buzz/whole point is all about plus some real world examples and history:
To deepen our understanding of the graph-theoretic foundations, I came across these books via blog.postmaster.gr:
“Graph Theory and Complex Networks: An Introduction” by Maarten van Steen. It is very interesting to note that this book is also available electronically as a personalised PDF. As the author notes: “When you write a book containing mathematical symbols, thinking big and acting commercially doesn’t seem the right combination. I merely hope to see the material to be used by many students and instructors everywhere and to receive a lot of constructive feedback that will lead to improvements. Acting commercially has never been one of my strong points anyway”.
– Reinhard Diestel: “Graph Theory“.
It is fun, indeed. Enjoy!
Google started to roll out the Knowledge Graph, intended to be more about things rather than just strings. Delivering and disambiguating related content based on semantic network associations sounds great, if this really is a step forward to move out of the filter-bubble remains to be seen. Overall, it seems to be related to the idea of a conceptual graph, and wikipedia forms a big chunk of the underlying knowledge-base.
techcrunch.com “Google Just Got A Whole Lot Smarter, Launches Its Knowledge Graph”
Googles official blog “Introducing the Knowledge Graph: things, not strings”
lifehacker.com “Google Knowledge Graph Brings Smarter Semantic Results to Your Google Searches”
webpronews.com “Knowledge Graph: Google Gets Tight With Wikipedia“
Just recently, I found this REAL bug sitting on the edge of my screen while coding – the (admittedly quite nerdy) irony of it is hard to miss. Rest assured, I ‘guided’ it away from ‘the system’ to the outside as gently as possible, resisting any impulse to to squash it using the keyboard on the spot. You know the rule, “Never touch a running system”, and unfortunately double-clicking and pressing <DEL> didn’t seem to work here.
A more funny (and nerdy) take on debugging code is this video by Atlassian called “Software Bugs” that made my morning:
“All bugs welcome! … create some buzz, … and when the spider gets here, I guess we can start talking web development”
Some more in-depth understanding of the issues involved is provided in this talk by Prof. Stephen Freund on “Stopping the Software Bug Epidemic” – he also touches on the halting problem, memory leaks and parallel code execution.
Although the talk is very informative throughout while presenting the basic issues in an entertaining way, I wonder why he didn’t mention the “Dining Philosophers Problem” – I guess it’s hard to trace deadlocks by automated checkers? In addition, he only refers to the (ancient) waterfall-modell of software engineering. Some comments on how more modern development philosophies (eXtreme programming, agile etc.) fit into the picture would have been nice. Anway, Happy deBugging!
Just a couple of days ago the PDB hit over 80.000 structures – that’s a lot of structural information at the molecular level to go by, especially since the 40k mark was surpassed just 5 years ago. That also means that we get now the same number of new entries every year as were available in total around 1998.
And finally, on the topic of drug-design, there is “the saga of Molly” – Although there is commercial interest behind the blog (no problem there for the critically yet open-minded reader), I like the tale because it is written from an entirely different perspective, and, as you know, I like looking at things from a different angle.
This is the tale of one molecule’s long sojourn from the organic lab through Phase III clinical testing. Be forewarned – it’s written from the understandably limited and skewed perspective of the molecule.
Ben Goldacre, the physician and biostatistician behind the always-excellent Bad Science column in the Guardian, gave a barnburner of a talk at Strata 2012 yesterday, “The Information Architecture of Medicine is Broken“. For anyone not aware of the problems caused by publication bias in clinical trials (for example, ineffective drugs with a wide variety of side-effects coming to market), his talk is a must-watch.
(Shared by İbrahim Mutlay via LinkedIn, see also this blog-entry on the topic)
The line “Everybody should have a cousin who is a better Python programmer than oneself” made my day. Enjoy!
… a consortium of leading IT providers and three of Europe’s biggest research centres (CERN, EMBL and ESA) announced a partnership to launch a European cloud computing platform. ‘Helix Nebula ‐ the Science Cloud’, will support the massive IT requirements of European scientists, and become available to governmental organisations and industry after an initial pilot phase.
The partnership is working to establish a sustainable European cloud computing infrastructure, supported by industrial partners, which will provide stable computing capacities and services that elastically meet demand.
Building an efficient scientific cloud infrastructure in europe is a good thing, considering the onslaught of data from genomics, high-energy physics and sattelites. But I somewhat can’t shake off the uneasy feeling that the big-science flag-ship projects don’t leave any room for grassroots developments anymore, i.e. movements like the WWW when it took off in the mid-nineties. Along these lines, I’d rather (or at least equivalently) see the LinkedOpenData (as advocated by Tim Berners-Lee for several years) agressively being pushed forward and funded appropriately, the pay-offs are hard to (over-)estimate. But anyway, here are some links to make up your own mind:
The International Molecular Exchange Consortium IMEx is the latest effort of data-providers to integrate Protein-Interaction Data –
- A non-redundant set of protein-protein interaction data from a broad taxonomic range of organisms
- the data in standards compliant download formats (MITAB or PSI-MI XML 2.5)
- Expertly curated from direct submissions or peer-reviewed journals to a consistent high standard.[ … aiming to … ]
- Develop and work to a single set of curation rules when capturing data from both directly deposited interaction data or from publications in peer-reviewed journals
- Make these interaction available in a single search interface on a common website
- Make all IMEx records freely accessible under the Creative Commons Attribution License
If you’ve been looking for that one-stop shop for getting a representative dataset of Protein-Protein Interactions, this just looks like it. There is an overview available on youtube (see below)
… and a training course on “Networks and Pathways Bioinformatics for Biologists” will take place at EMBL-EBI in May.