"A few" does appear quite dismissive of the enormous amounts of effort in structural biology so far. There are more than 170,000 structures in the PDB right now.
To determine potential targets for drugs we have to understand what the proteins do. Having the structure is not really enough for that, it doesn't tell you the purpose of the protein (though it certainly can give you some hints).
In most cases the proteins were determined to be interesting by other experiments, and then people decided to try and solve their structure. So the structures we already solved are also biased towards the more biologically relevant proteins.
170k is "a few" compared to 180 million (i.e. the size of the PDB as soon as someone runs AlphaFold over everything in the UniProt.)
> In most cases the proteins were determined to be interesting by other experiments, and then people decided to try and solve their structure.
Yes, that's what we're doing right now, because structure is not a useful predictor, because we don't have structure available in advance of studies on the protein itself. There was no point to a "functional taxonomy" of proteins, because we were never trying to predict with protein-structure as the only data available.
In a world where protein structure is "on tap" in a data warehouse, part of the game of bioinformatics will become "structural analysis" of classes of known-function proteins, to find functional sub-units that do similar things among all studied proteins, allowing searches to be conducted for other proteins that express similar functional sub-units.
Determining what a protein structure does might be even harder than folding. Right now we can't really do that ab initio, you have determine the activity in the lab and then look at the structure. And that allows you to potentially identify this motif in other proteins.
If someone produces an AI that you give a sequence and it tells you what the protein does exactly, I'd be extremely impressed. I don't see that happening soon.
The specifics matter a lot here. We can often determine rough functions for subdomains by homology alone. But that really doesn't tell you the full story, it only gives you some hints on what that protein actually does.
"If someone produces an AI that you give a sequence and it tells you the protein conformation, I'd be extremely impressed".
Sure there are many more things to solve in this space; but that doesn't take away that this is an impressive achievement and does unlock quite a few things (including making more tractable the problem you just brought up). I'm excited to see what DeepMind works on now and what the new state of the world will be just five years from now.
I think I have to clarify that my response was to a large part to the "this will change all our lives" part, and might look too negative on its own. I'm very, very impressed by these results, but that still doesn't mean that we just solved biology. If this works that well on folding, this could mean that a lot of other stuff that simply didn't work well in silico might come into reach.
I'm maybe overcompensating for the tech-centric population here, with some comments speculating for very near and drastic impacts from discoveries like this. Biology and life sciences are much slower, and there's always more complexity below every breakthrough. That does tend to push me towards commenting with the more skeptical and sober view here.
My understanding of this is not perfect, but wouldn't answering the "actually does" question require a full biomolecular model of the cell, or even the whole organism? If so I see what you mean. I suppose that it might be possible to get around this by improving the theory of catalysts so that you could look at a site and say, "oh, this will act in such a way..." Dynamic quantum simulation of a few atoms at the active site is hardly easy but a far sight easier than the other.
It's a step forward for sure, but structures change over time to perform their function. The method described here only returns a static structure. Much more research and development is needed to be able to predict the dynamic behavior and interplay with other proteins or RNA.
> as soon as someone runs AlphaFold over everything in the UniProt
It'll take a while before those results can be trusted, though, right? There's probably a selection bias in the training data for proteins which are easy to crystallize, so many proteins probably aren't well represented by the training examples.
170,000 is three orders of magnitude less than the number of recorded protein sequences. I don't think it's dismissive to describe that as comparatively few.
Structure is much, much more conserved than sequence. In other words, protein sequences with low sequence identity can fold similarly due to the physical constraints that guide protein folding.
I also don't know the field and the opposite concern is that 170,000 sounds like a lot, but, apparently, it's a relatively small amount compared to the number of proteins there are. It makes sense to me to refer to it as a small number - e.g. "That hard drive is tiny." "No, it stores several million bytes..."
To determine potential targets for drugs we have to understand what the proteins do. Having the structure is not really enough for that, it doesn't tell you the purpose of the protein (though it certainly can give you some hints).
In most cases the proteins were determined to be interesting by other experiments, and then people decided to try and solve their structure. So the structures we already solved are also biased towards the more biologically relevant proteins.