Indeed, in the very long run, it should only be necessary to
determine the amino acid sequence of a protein, and its three-dimensional
structure could then be predicted; in my view this day will not come soon,
but when it does come the X-ray crystallographers can go out of business,
perhaps with a certain sense of relief, and it will also be possible to discuss
the structures of many important proteins which cannot be crystallized and
therefore lie outside the crystallographer’s purview.
If you are into (structural) molecular biology, you will probably have seen this before. Honestly, I don’t get tired of reading this statement. That was 49 years (and 11 days, to be precise) ago – where are we now, almost half a century later? Are we there yet? (sounds like the little ones nagging on a long-distance journey – daddy told you it would take a while!) Seems we might be there soon, since we have made quite some headway recently.
First of all, the above statement displays some amazing farsightedness combined with a humble self-perception. He is not overstating it, indicating that not all will be crystallized. If you read on in his speech, he was already talking about larger assemblies and complexes, and that’s where we are now, and that’s where things get REALLY interesting. Besides the picture with him modeling a 3D structure (on the sticks for z axis) is by no means old-fashioned, to me it means he just took what was available at the time to get the 3D model constructed. Today we have sophisticated ComputerGraphics, yet nothing beats the experience of building a physical model – an art that should not be forgotten and developed further (thinking of 3D printing here). I am convinced that even in the age of the high-throughput techniques, interaction data etc. we ultimately need a structural view to truly understand the molecular mechanisms.
But the main point – or prediction – is that ultimately, we should be able to compute structure and function from sequence alone.
If you think about it, that’s a very bold statement indeed, with wide ramifications. By now our sequencing capabilities are growing at a pace beyond Moore’s law (see here). I probably don’t have to remind ourselves that experimental structure determination is difficult and time-consuming, to say the least. And computer predictions in the absence of a related solved structure in the PDB are usually no match for the real thing (a.k.a. experimental 3D structure).
But there is a fresh breeze in the field: Recently a number of groups report that the ancient dream (from the mid-nineties and even before, “ancient” in bioinformatics = over 15 yrs) of using patterns of correlated mutations to derive useful spatial constraints for structure prediction does work indeed. Properly. Finally!
Given enough information content, seems there are no limits to the size of the proteins, and even notoriously difficult ones like transmembrane structures seem to work. All you need is sequences. And lots of them. Properly aligned, of course. (That’s what a lot of bioinformatics was all about, wasn’t it?) But massive amounts of sequences is what we get anyway these days, more than you ever wanted (to analyze) from next-gen sequencing projects. That’s off-topic, delving deeper into that mania is a topic for different post to explore.
If you are interested to check it out in depth: One of the methods is called EVfold, see http://EVfold.org.
Of course, there is still some room for optimization, cross-fertilization and improvement in the methods, I think. Simply by looking at some of the predicted contact maps, it’s fairly obvious to me these methods are not only better than what was available so far, but they are also not identical. Seeing their performance and following the competition in this field hotting up on next years CASP will be jolly exciting.
I’m sure I’ll keep you posted on further developments and deeper analysis – for the moment I’ll leave you with a few references to get started. As a final word, I am so glad most of them (at least the ones I list below) are not hidden behind a payhedge but open access, free to check-out by anyone who cares.
- Marks DS, Colwell LJ, Sheridan R, Hopf TA, Pagnani A, Zecchina R, Sander C. (2011) Protein 3D Structure Computed from Evolutionary Sequence Variation. PLoS ONE 6(12): e28766. doi:10.1371/journal.pone.0028766
- Taylor WR, Sadowski MI (2011) Structural Constraints on the Covariance Matrix Derived from Multiple Aligned Protein Sequences. PLoS ONE 6(12): e28265. doi:10.1371/journal.pone.0028265
- Burger L, van Nimwegen E (2010) Disentangling Direct from Indirect Co-Evolution of Residues in Protein Alignments. PLoS Comput Biol 6(1): e1000633. doi:10.1371/journal.pcbi.1000633