I worked for a while on extremeophile Archaeal viruses - the type that infect or...

Centigonal · 2025-04-28T04:05:00 1745813100

> Presumably, if AlphaFold is finding the relationship, there's some information preserved at the sequence level

This is not my area of expertise, and maybe I'm misunderstanding this, but I thought that what AlphaFold does is extrapolate a structure from the sequence. The actual relationship with the other existing proteins would have been found by the investigators through other, more traditional means (like the 3D search you mentioned).

IX-103 · 2025-04-28T12:19:44 1745842784

I'm not sure about that. The way AlphaFold works involves transforming the protein from a vector space representing the sequence to a different vector space representing the folded structure and back again as it performs iterative refinement. Presumably you could perform a comparison in the structure space to find homologs that have completely different sequences - they would just have a high cosine similarity.

Checking sub-regions of the structure would be more difficult, but depending on how the structural representation works it could just be computationally intensive.

colingauvin · 2025-04-28T15:22:45 1745853765

This is a very big misconception about AlphaFold. It's not generating a structure totally de novo from sequencing. Instead it's primarily finding relationships on the sequence level to other solved structures. If those structure/sequence relationships didn't exist somewhere, AF wouldn't work because it doesn't really have much information about protein folding from first principles. There are some small de novo elements, but nothing really groundbreaking. Where AF's true strength lies is in it's ability to detect relationships we have been unable to detect with any other method.

Centigonal · 2025-04-28T15:41:14 1745854874

Wow, that makes sense. Thank you for explaining this -- it makes Alphafold a little less inexplicable magic and a little more science/engineering in my mind.

im3w1l · 2025-04-28T02:17:25 1745806645

What about convergent evolution? Are you ruling that out because you reason that there are many possible structures that could do the same job so it's too much of a coincidence how close it matches?

Teever · 2025-04-28T05:26:45 1745818005

Can you explain to a layman how wildly different genes can produce identical proteins?

mtlmtlmtlmtl · 2025-04-28T08:22:32 1745828552

IANAB, but from what I do understand. It depends what you mean by different genes. Information wise, DNA is a string of base 4 digits(nucleotides) in groups of 3 digits, these groups are called codons. Each codon corresponds to a specific amino acid*. A protein is made up of a bunch of different amino acids chained together. The gene determines which amino acids are chained together and in what order. This long chain of amino acids tends to fold up into a complex 3 dimensional structure, and this 3 dimensional structure determines the protein's function.

Now, there are a couple ways a gene could be different without altering the protein's function. It turns out multiple codons can code for the same amino acid. So if you switch out one codon for another which codes for the same amino acid, obviously you get a chemically identical sequence and therefore the exact same protein. The other way is you switch an amino acid, but this doesn't meaningfully affect the folded 3D structure of the finished protein, at least not in a way that alters its function. Both these types of mutations are quite common; because they don't affect function, they're not "weeded out" by evolution and tend to accumulate over evolutionary time.

* except for a few that are known as start and stop codons. They delineate the start and end of a gene.

clort · 2025-04-28T06:32:07 1745821927

also a layman, but:

You could build houses from bricks, timber or poured concrete that all looked the same in the end. Their internal structures and methods of construction would be different, but they would have the same form.

I'm reading the GP's comment similarly.

SideburnsOfDoom · 2025-04-28T08:15:21 1745828121

also a layman, but:

genes are instructions for building proteins.

For a given output, you could write a program in wildly different programming languages, or even use the same language but structure it in wildly different ways.

If there's no match for the source code (genes), then find a match for the output (protein).

DrAwdeOccarim · 2025-04-28T10:26:29 1745835989

This is a perfect analogy.

Source: Am structural biochemist

colingauvin · 2025-04-28T15:26:32 1745853992

There are basically 4 classes of amino acids:

Non-polar

Polar

Acidic

Basic

In terms of 3D fold - i.e. the general abstract shape of the protein in 3D, you can make loads of substitutions without changing it, generally as long as you stay within the same class.

It's not until you compare the 3D shape that you see the relationship.