Posts Tagged Sequence
Utopia is a collection of interactive tools for analysing protein sequence and structure. Up front are user-friendly and responsive visualisation applications, behind the scenes a sophisticated model that allows these to work together and hides much of the tedious work of dealing with file formats and web services.
The installation package (provided by the AdvancedInterfacesGroup AIG) includes
- CINEMA – multiple sequence alignment editor
- Ambrosia – molecular structure viewer
- UTOPIA – support libraries and plugins
After a quick & painless installation, it seems to work out of the box. More in-depth info when I get to grips with more of the functionality.
If anything, then it’s the MultipleSequenceAlignment (MSA) problem (in combination with the folding problem) which defines the core of bioinformatics. At least from my perspective, since that’s from where I started out my adventures in the field. Already fold.it successfully demonstrated for protein folding that it is possible to tackle hard problems by crowd-sourcing, a.k.a. Citizen Science. After all, the pattern recognition software installed on the wetware between your ears is highly evolved and can complement pure in-silico calculations. With Phylo researchers from McGill university have taken this approach to the sequence level:
Phylo is a challenging flash game in which every puzzle completed contributes to mapping diseases within human DNA.
Although the call for CitizenScience is not entirely new, it is boosted by such developments over the internet significantly. Who said that science and fun do not go together and can only be done while wearing a labcoat and operating extremely expensive machinery (?) – quite the opposite!
Biochemist Erwin Chargaff advocated a return to science by nature-loving amateurs in the tradition of Descartes, Newton, Leibniz, Buffon, and Darwin — science dominated by “amateurship instead of money-biased technical bureaucrats”.
Now that’s some company to be proud of. And I can’t say I completely disagree, albeit I’d like to think the two are not necessarily mutually exclusive (for-the-love-of-it vs. for-profit). If you’d like to get started, check out the tutorial video below and have fun aligning!
Reference: “Phylo: A Citizen Science Approach for Improving Multiple Sequence Alignment” by Alexander Kawrykow, Gary Roumanis, Alfred Kam, Daniel Kwak, Clarence Leung, Chu Wu, Eleyine Zarour, Phylo players, Luis Sarmenta, Mathieu Blanchette and Jérôme Waldispühl (2012) PLoS ONE 7(3): e31362. doi:10.1371/journal.pone.0031362
“HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment” by Michael Remmert, Andreas Biegert, Andreas Hauser & Johannes Söding (Nature Methods 9, 173–175 (2012) doi:10.1038/nmeth.1818)
“HHblits is the first iterative method based on the pairwise comparison of profile Hidden Markov Models. In benchmarks it achieves better runtimes than other iterative sequence search methods such as PSI-BLAST or HMMER3 by using a fast prefilter based on profile-profile comparison. Furthermore, HHblits greatly improves upon PSI-BLAST and HMMER3 in terms of sensitivity/selectivity and alignment quality.”
The entire suite of programs is available for all major OSs.
Indeed, in the very long run, it should only be necessary to
determine the amino acid sequence of a protein, and its three-dimensional
structure could then be predicted; in my view this day will not come soon,
but when it does come the X-ray crystallographers can go out of business,
perhaps with a certain sense of relief, and it will also be possible to discuss
the structures of many important proteins which cannot be crystallized and
therefore lie outside the crystallographer’s purview.
If you are into (structural) molecular biology, you will probably have seen this before. Honestly, I don’t get tired of reading this statement. That was 49 years (and 11 days, to be precise) ago – where are we now, almost half a century later? Are we there yet? (sounds like the little ones nagging on a long-distance journey – daddy told you it would take a while!) Seems we might be there soon, since we have made quite some headway recently.
First of all, the above statement displays some amazing farsightedness combined with a humble self-perception. He is not overstating it, indicating that not all will be crystallized. If you read on in his speech, he was already talking about larger assemblies and complexes, and that’s where we are now, and that’s where things get REALLY interesting. Besides the picture with him modeling a 3D structure (on the sticks for z axis) is by no means old-fashioned, to me it means he just took what was available at the time to get the 3D model constructed. Today we have sophisticated ComputerGraphics, yet nothing beats the experience of building a physical model – an art that should not be forgotten and developed further (thinking of 3D printing here). I am convinced that even in the age of the high-throughput techniques, interaction data etc. we ultimately need a structural view to truly understand the molecular mechanisms.
But the main point – or prediction – is that ultimately, we should be able to compute structure and function from sequence alone.
If you think about it, that’s a very bold statement indeed, with wide ramifications. By now our sequencing capabilities are growing at a pace beyond Moore’s law (see here). I probably don’t have to remind ourselves that experimental structure determination is difficult and time-consuming, to say the least. And computer predictions in the absence of a related solved structure in the PDB are usually no match for the real thing (a.k.a. experimental 3D structure).
But there is a fresh breeze in the field: Recently a number of groups report that the ancient dream (from the mid-nineties and even before, “ancient” in bioinformatics = over 15 yrs) of using patterns of correlated mutations to derive useful spatial constraints for structure prediction does work indeed. Properly. Finally!
Given enough information content, seems there are no limits to the size of the proteins, and even notoriously difficult ones like transmembrane structures seem to work. All you need is sequences. And lots of them. Properly aligned, of course. (That’s what a lot of bioinformatics was all about, wasn’t it?) But massive amounts of sequences is what we get anyway these days, more than you ever wanted (to analyze) from next-gen sequencing projects. That’s off-topic, delving deeper into that mania is a topic for different post to explore.
If you are interested to check it out in depth: One of the methods is called EVfold, see http://EVfold.org.
Of course, there is still some room for optimization, cross-fertilization and improvement in the methods, I think. Simply by looking at some of the predicted contact maps, it’s fairly obvious to me these methods are not only better than what was available so far, but they are also not identical. Seeing their performance and following the competition in this field hotting up on next years CASP will be jolly exciting.
I’m sure I’ll keep you posted on further developments and deeper analysis – for the moment I’ll leave you with a few references to get started. As a final word, I am so glad most of them (at least the ones I list below) are not hidden behind a payhedge but open access, free to check-out by anyone who cares.
- Marks DS, Colwell LJ, Sheridan R, Hopf TA, Pagnani A, Zecchina R, Sander C. (2011) Protein 3D Structure Computed from Evolutionary Sequence Variation. PLoS ONE 6(12): e28766. doi:10.1371/journal.pone.0028766
- Taylor WR, Sadowski MI (2011) Structural Constraints on the Covariance Matrix Derived from Multiple Aligned Protein Sequences. PLoS ONE 6(12): e28265. doi:10.1371/journal.pone.0028265
- Burger L, van Nimwegen E (2010) Disentangling Direct from Indirect Co-Evolution of Residues in Protein Alignments. PLoS Comput Biol 6(1): e1000633. doi:10.1371/journal.pcbi.1000633
Together with Toby Gibson and Niall Haslam, Aidan Budd is organizing an EMBO-funded course in September. It’s mostly aimed at wet-lab scientists – see below. He asked me to circulate this to other scientists who I think might be interested in the course – I thought this might be a suitable spot – the full poster can be found here.
EMBO Practical Course on Protein Bioinformatics Tools – Focus on
Regulatory Proteins: Sequences, Structures, Interactions, Networks
Read the rest of this entry »