Archive for June, 2011
Current sequencing efforts churn out massive amounts of (missense) mutations for a given reference genome – so cancer-cell lines are an obvious and worthy target for these shiny new big guns. Adressing the question of what these mutations are actually doing (wrong) and how they affect structure and function of proteins is not quite straightforward. The problem is that each tumor has a slightly different set of mutations, even parts of the same tumor might not be genetically identical. Are there detectable trends that would tell us more about the molecular mechanisms involved?
One could speculate about the dominant mode of action in tumor initiaition / progression at the molecular scale: For example, mutating a small, hydrophobic amino-acid in the core into a big, charged one might screw up the entire structure, rendering the protein inactive (loss-of-function). Alternatively, one could imagine how mutations at the surface alter binding affinity / specificity and cause havoc in the downstream signalling and regulatory pathways (gain-of-function).
Read the rest of this entry »
During my studies, I made a habit of visiting the Cambridge University Press bookstore at least once a month. As a kind of wishlist, I noted potentially interesting new books – something I might continue in an open format here. Whenever the occasion and funding presented itself, I could draw from that list and get what seems most relevant and helpful. So far, reading on a tablet or screen does not quite have the same sensual appeal as a book for me quite yet – call me hopelessly old-fashioned. But of course, this being the 21st century, the advantage of carrying an entire electronic library with you in a tin-box that weighs less than your average textbook is a point that’s hard to argue about, especially in combination with advanced search and analysis tools. Nevertheless, here are some recent publications available in classical dead-tree format:
Systems Biology: Simulation of Dynamic Network States
Bernhard Ø. Palsson, University of California, San Diego
EMBOSS User’s Guide
by Peter M. Rice, European Bioinformatics Institute, Hinxton
Alan J. Bleasby, European Bioinformatics Institute, Hinxton
Jon C. Ison, European Bioinformatics Institute, Hinxton
The European Molecular Biology Open Software Suite (EMBOSS) is a well established, high quality package of open source software tools for molecular biology. It includes over 200 applications for molecular sequence analysis and general bioinformatics including sequence alignment, rapid database searching and sequence retrieval, motif identification and pattern analysis and much more.
The entire list of CUP titles in the section “Genomics, bioinformatics and systems biology” is here.
The International Supercomputing Conference (ISC’11) in Hamburg just ended yesterday, and there’s plenty in terms of a video blog, social media feed, live-stream etc. to check out. There is quite some stuff happening in terms of hardware-developments in HighPerformanceComputing (HPC), also with respect to applications in the Life Sciences. Probably the main headline is that there is a new Nr.1: The new japanese supercomputer K (@ Riken) now packs more of a punch than the next five systems on the top500-list combined, displacing the Tianhe-1A, who took pole position last October.
However, some things have not changed:
* Linux is still the dominant OS
* Big Blue (IBM) still dominates the market, followed by HP and Cray
* The trend towards GPU acceleration continues (although the “K” doesn’t use them)
* massively parallel processing (MPP) systems continue to increase their share
for more in-depth-info, see http://www.hpcwire.com/
So far when dealing with hu-Hu-HUGE networks, the data cannot be processed in the memory of a single machine. Usually, we store the network in database tables (or similar, but worse: excel spreadsheets) describing the nodes and edges. Then you have to implement the graph-algorithm of your choice in this framework, which usually is leads to sub-optimal performance (putting it mildly). Straightforward optimizations would be for example in adressing a single node, the database could already load the adjacent edges into memory (cache) so the immediate next steps do not require additional access to the the disk-drive. Also, you might want to distribute parts of the network across several machines. Of course a carefully handcrafted and optimized object-relational mapping with tuned indices can do little wonders when you get it right, but the nagging thought remains that this can – and has to! – be dealt with in a better way. By now not only bioinformaticians and google-employees feel the occasional need to crunch BIG GRAPHS. Read the rest of this entry »
Under selective (funding) pressure, outsourcing the annotation of biological data seems to be an inevitable necessity, not only for economic reasons. The existing pipelines for high-quality annotation just cannot keep up with the amount of papers, let alone raw data, that is produced. Something that started with Rfam has now spread to other databases (well described in “A Warning Sign for Biomedical Databases” by Manuel Corpas). Similarly, crowd-sourced “secondary” databases started to add value to established primary databases like the Protein Data Bank (PDB) – PDBWiki and Proteopedia for example. So how is this concept going to work in future, and what are the pitfalls? Read the rest of this entry »