"A few" does appear quite dismissive of the enormous amounts of effort in struct...

derefr · on Nov 30, 2020

170k is "a few" compared to 180 million (i.e. the size of the PDB as soon as someone runs AlphaFold over everything in the UniProt.)

> In most cases the proteins were determined to be interesting by other experiments, and then people decided to try and solve their structure.

Yes, that's what we're doing right now, because structure is not a useful predictor, because we don't have structure available in advance of studies on the protein itself. There was no point to a "functional taxonomy" of proteins, because we were never trying to predict with protein-structure as the only data available.

In a world where protein structure is "on tap" in a data warehouse, part of the game of bioinformatics will become "structural analysis" of classes of known-function proteins, to find functional sub-units that do similar things among all studied proteins, allowing searches to be conducted for other proteins that express similar functional sub-units.

fabian2k · on Nov 30, 2020

Determining what a protein structure does might be even harder than folding. Right now we can't really do that ab initio, you have determine the activity in the lab and then look at the structure. And that allows you to potentially identify this motif in other proteins.

If someone produces an AI that you give a sequence and it tells you what the protein does exactly, I'd be extremely impressed. I don't see that happening soon.

The specifics matter a lot here. We can often determine rough functions for subdomains by homology alone. But that really doesn't tell you the full story, it only gives you some hints on what that protein actually does.

jeffxtreme · on Nov 30, 2020

Five years ago, I would have said the following:

"If someone produces an AI that you give a sequence and it tells you the protein conformation, I'd be extremely impressed".

Sure there are many more things to solve in this space; but that doesn't take away that this is an impressive achievement and does unlock quite a few things (including making more tractable the problem you just brought up). I'm excited to see what DeepMind works on now and what the new state of the world will be just five years from now.

fabian2k · on Nov 30, 2020

I think I have to clarify that my response was to a large part to the "this will change all our lives" part, and might look too negative on its own. I'm very, very impressed by these results, but that still doesn't mean that we just solved biology. If this works that well on folding, this could mean that a lot of other stuff that simply didn't work well in silico might come into reach.

I'm maybe overcompensating for the tech-centric population here, with some comments speculating for very near and drastic impacts from discoveries like this. Biology and life sciences are much slower, and there's always more complexity below every breakthrough. That does tend to push me towards commenting with the more skeptical and sober view here.

whatshisface · on Nov 30, 2020

My understanding of this is not perfect, but wouldn't answering the "actually does" question require a full biomolecular model of the cell, or even the whole organism? If so I see what you mean. I suppose that it might be possible to get around this by improving the theory of catalysts so that you could look at a site and say, "oh, this will act in such a way..." Dynamic quantum simulation of a few atoms at the active site is hardly easy but a far sight easier than the other.

Rochus · on Nov 30, 2020

It's a step forward for sure, but structures change over time to perform their function. The method described here only returns a static structure. Much more research and development is needed to be able to predict the dynamic behavior and interplay with other proteins or RNA.

AlexCoventry · on Nov 30, 2020

> as soon as someone runs AlphaFold over everything in the UniProt

It'll take a while before those results can be trusted, though, right? There's probably a selection bias in the training data for proteins which are easy to crystallize, so many proteins probably aren't well represented by the training examples.

entropicdrifter · on Nov 30, 2020

170,000 is three orders of magnitude less than the number of recorded protein sequences. I don't think it's dismissive to describe that as comparatively few.

flobosg · on Nov 30, 2020

Structure is much, much more conserved than sequence. In other words, protein sequences with low sequence identity can fold similarly due to the physical constraints that guide protein folding.

ClumsyPilot · on Nov 30, 2020

I don't know the field, and I understood 'a few' as like a dozen, certainly not in the thousands.

Anyone uninitiated with think the same, and thise already informed. Well, they are already informed.

ALittleLight · on Nov 30, 2020

I also don't know the field and the opposite concern is that 170,000 sounds like a lot, but, apparently, it's a relatively small amount compared to the number of proteins there are. It makes sense to me to refer to it as a small number - e.g. "That hard drive is tiny." "No, it stores several million bytes..."