The key to why the Oklo reactor was possible was not that a natural process separated U-235 from U-238 from a Uranium ore, but that the reactor existed 1.7 billion years ago. U-235 has a half life of about 700 million years so the ratio of U-235 to U-238 in naturally occuring Uranium decreases over time.
U-235 was about 3.1% of Uranium in ores 1.7 billion years ago and is what can be used in some reactors today. Uranium ore today has only about %0.72, so an Oklo type natural reactor could not form on today's Earth. See [1] for a good description; the atlasobscura article is poor and factually incorrect in places.
But in a molten core, won't hydrogen sulfide and methane bubble out from the center, leaving only iron and heavier elements and compounds in the super high pressure depths?
Very likely. But more studies would be needed. The best phase diagram for H2S and methane that I could find only went up to ~140 barr. Though there were some interesting papers on clathrates and methane in the kilo-barr range, but I'd not think those were interesting for deep earth experiments. The deep earth is ~3.3 mega-barr, for comparison.
The universe is pretty big, and there are lots of places where unlikely things didn't happen. When the next unlikely thing happens, it most likely won't happen here.
If you ask yourself this question, then 100%. You need to exist to even consider the unlikelihood of your existence.
But as far as I know the existence of naturally occurring superconductors is a completely independent probability, so it doesn't really make sense to use one to justify the other. Are there naturally occurring superconductors somewhere in the universe? I mean, without doing any calculations I'm tempted to say almost certainly yes. Do they exist somewhere in the Earth? As far as we know, extremely unlikely.
It's pretty likely. There's at least 1,000,000,000,000,000,000,000,000 stars in the universe, most of them have planets. Billion of years passed since the Big Bang. If some chemical process can create life it's very likely that somewhere it did.
Scientists who have done the math disagree. It's a major current issue.
It's extremely difficult to produce useful proteins from random DNA chains. As in, if I took all of the atoms in our galaxy, paired each with random DNA, and allowed you to pick just one (blindfolded), only one of those DNA strands would contain the DNA necessary to produce a valid/useful protein. Literally every other atom has garbage DNA/proteins. The human body contains between 80,000 and 400,000 proteins.
DNA looks far more like "information" than it does like random bits written to disk. It's analogous to trying to find an x86 program of at least 160 instructions that computes a valid mathematical function by randomly splatting 1s and 0s to disk and then "running" the "code". Eh, maybe it'll eventually happen, but you can see how hard it actually is in practice. Heat-death-of-the-Universe hard.
Give a long enough timeframe, even unlikely things will happen. However, the Universe isn't very old—which is why it's a major current issue.
You do realize that natural selection iterates through these same sorts of garbage proteins at a rate of trillions and trillions of bacteria per year for millions of years?, and with bias toward previously existing functional structures? It's a pretty potent optimization process, given the timeframes and scale. and the uncertainty is enough. It isn't equivalent to that, because there can also be incremental progress made toward a functional protein, unlike code.
There is a question of where does the first self-replicating molecule come from, and how do we get to DNA from that, and how do we get the diversity of proteins that we see today.
Creating random DNA sequences and showing they don't produce 'useful' proteins has nothing to do with any of those questions (and how do we even know they are not useful?)
I think the main stumbling block for me is the perception of time scales. It is impossible for me to say, with certainty, any information regarding time scales of a billion years and the randomness which permeates the evolutionary process. What we see and know are the winners of the race, not the mountains of failures.
> It's extremely difficult to produce useful proteins from random DNA chains.
I think we all agree about that.
> Scientists who have done the math disagree. It's a major current issue.
The problem is that you can do the correct calculation or a wrong calculation. For example the number 160 is probably too high. Some of the current proteins have a 160 or more amino acids, but there are shorter proteins, and there are some useful short amino acid chains with only 20 amino acids.
There's between 100 and 200 billion galaxies in the observable universe. There were billions of years to do the choosing - how many times per minute am I allowed to do it?
> The human body contains between 80,000 and 400,000 proteins.
Good thing first life wasn't homo sapiens then (and probably wasn't using DNA).
The framing you have here is an attractive one, but I don't think it makes much sense in the context of reproducing molecules.
There is no reason to posit random DNA chains.
The statement that "the number of DNA chains that produce valid/useful protein in the space of all possible DNA chains is vanishingly small" seems reasonable (however I'm not sure how we would know these chains are the only ones that produce valid/useful proteins).
The idea that we need to choose randomly from the space of all possible DNA chains is not reasonable.
----
Once we have a reproducing molecule, we expect to see a multitude of valid reproducing molecules as descendants of that first molecule. We expect (at least some of) these descendants to eventually be extremely different from the original molecule, and by their nature valid reproducing molecules.
Once we have a reproducing molecule (like DNA) that creates other molecules (like RNA and proteins) we can expect the same of its descendants, and the descendants' by-products.
If these molecules form an ecosystem, where the reproduction of one relies on the validity of the other, the only succesful variations within the ecosystem will be valid variations of the ecosystem.
----
The space that we are choosing from is not the space of all possible DNA chains, it is the space of all DNA chains adjacent to existing valid chains (or chains in a valid ecosystem).
It's analogous to taking a valid x86 program that can reproduce, randomly adding/removing/mutating some bits on reproduction (with low frequency, very quickly, and in a ginormous space - think on the scale of molecules in the Earth's oceans), and asking if that new program is also valid. And then, after millions of years of this, asking if one of the programs is a valid mathematical function.
----
There are still big questions here. Questions like "how do we get the first reproducing molecule?" and "is DNA likely to arise once you have reproducing molecules, or just one out of many options?"
None of those questions give reason to evoke the number of all possible variations of DNA as evidence that the variation we see in proteins is somehow unlikely.
Once we know that there exists one valid DNA/protein system (which we do, as it exists), and we know that variations of DNA/protein ecosystems can be functional (which we do, as we've observed it), it is reasonable to expect a multitude of valid, functional DNA chains, and the proteins produced by them.
Like you, I can imagine hundreds—maybe thousands—of ways to resolve these issues.
That's hardly relevant though, what matters are resolutions that actually work.
I agree that we are (very) likely to find the mechanisms involved, but so far, we haven't. In fact, we don't even have a theory on how DNA was originally developed, or how non-functional DNA/proteins self-replicate, or really anything at all. We only have the end product (which does—as you point out—work). The question is how did it get there, and previous hand waving about a huge, old Universe and random chance isn't sufficient.
It's going to have to be something similar to what you (and other commenters) describe: mechanisms that preferentially and relatively quickly produce valid, self-replicating DNA/protein chains. To date, no one has found anything even close to that.
You see the difference between this argument and what you wrote above though?
Perhaps I'm reading your original post too strongly, so please correct me if so.
In the first post you compare the number of valid DNA chains to the space of all possible chains, you mention the number of different proteins in the human body, and you draw an analogy to a random sequence of bits forming a valid program.
None of these talk to the probability of a reproducing molecule arising through physical processes, nor do they talk to the probability of DNA as a descendant of that original reproducing molecule (or potentially multiple original molecules).
I get that you understand the gaps in our knowledge of how these systems came to be; my point is that your original argument is misleading in the exact same way you claim the argument
"Billion of years passed since the Big Bang. If some chemical process can create life it's very likely that somewhere it did."
is a
"kind of hand-wavey statement [that] seems to convince most people. Universe is hella-old, and really big. Ergo, incredibly rare stuff has happened basically infinitely many times. Life everywhere, etc."
(this was a reply to a different post, but I think it holds to the comment you originally replied to).
In fact, I find the argument that "things reproduce, and have been reproducing for a long time in a large environment, so we expect to see complexity in those things" much more reasonable than "most random arrangments of this molecule are useless, and we can see lots of useful arrangements, therefore time and randomness can't explain them".
We're discussing how to get those "things that reproduce" in the first place. I agree that once you have useful things that reproduce, it's easy to keep it going. Similarly, if I have a running copy of Linux, I can use the tools (and source code) to produce another copy of Linux.
But how do we get the first copy, the "original reproducing molecule" as you put it?
The usual explanation is that the "first copy" arose randomly, and then kept going. Do you believe that? I suspect not—but most people do.
We know that it can't have been random (which is the argument I gave, and I suspect you agree with). We should tell people "it wasn't random, something about the fundamental nature of these molecules caused better and more complex molecules to emerge." But we have no mechanism for that, just a (valid) belief that it has to be true.
I think we should find those mechanisms, and simultaneously, stop telling people that random chance + vast universe + long timespan is sufficient.
> The usual explanation is that the "first copy" arose randomly, and then kept going. Do you believe that?
I believe a variation of that.
I believe that the first copy arose through physical processes.
Evoking 'randomness' is unnecessary and misleading.
Do you not believe this?
To my knowledge, we don't yet have a mechanism for how such a molecule came in to being (though there are ideas).
We also don't have any reason to think that it must be some random single choice from a large possibility space, and we don't have any evidence at all that it could have arisen from non-physical processes (what would that even look like?).
This is what I mean by random: no DNA sequence is privileged over any other, and no (known) physical process produces anything but random DNA sequences (excluding, of course, copying already useful DNA sequences).
DNA has about as much structure as bits on a disk (with coding for one of 20 amino acids as the "bits"). No DNA sequence is more likely than any other to exist.
I think that means we need to identify strong physical processes that produce useful DNA strands; you, apparently, aren't as concerned about it. Maybe you're right, but from where I'm sitting, it's hard to imagine what those physical processes might be since the strands they must produce are extremely, unimaginably rare in practice.
DNA is basically information[0], and we literally have no example of a chemical process producing valid DNA information, nor is it all obvious how such a process might work in practice. In the past, large amounts of time + equal likelihood of producing random DNA was considered sufficient to think "well, useful DNA stands could appear randomly." We now know that's extremely unlikely to the point of being effectively impossible, statistically-speaking.
But some DNA sequences are privileged over others!
The mechanisms for producing new DNA sequences involves copying existing DNA sequences. Thus, the ones that exist are privileged over the ones that do not exist (yet), and the adjacent sequences are privileged over a random sequence.
> No DNA sequence is more likely than any other to exist.
It is far more likely for a DNA sequence very similar to my own to exist than a random sequence.
> we need to identify strong physical processes that produce useful DNA strands
We have already identified those processes! We know quite well how the machinery of DNA replication works.
If we care about the first DNA molecule to ever exist it's a very different question. We don't need to find a physical process that produces a modern DNA molecule from 'raw parts', rather one that takes not-quite-DNA and converts it into DNA.
> it's hard to imagine what those physical processes might be since the strands they must produce are extremely, unimaginably rare in practice.
Can you imagine slightly simpler DNA? Say just a bit shorter? What's the simplest molecule we might still call DNA, that is reproducing? Can we imagine machinery that would produce that?
I think it's very reasonable to think such machinery could exist, even if we don't know the exact mechanisms involved. We know that RNA can self-reproduce, and also produce proteins, so it's reasonable to think that machinery to produce RNA strands could evolve to produce DNA strands (for example).
The only involvement randomness has in this whole process are (relatively) rare and infrequent changes to self-replicating molecules, and (potentially) the initial formation of a self-replicating molecule.
It is irrelevant how many possible DNA sequences there are, or how much information is stored within them, as we know new sequences are derived from previous ones.
> If we care about the first DNA molecule to ever exist it's a very different question. We don't need to find a physical process that produces a modern DNA molecule from 'raw parts', rather one that takes not-quite-DNA and converts it into DNA.
We haven't found that, and apparently aren't even close. We don't even have any idea what something like that might look like, or even more critically: given all the incredibly, insanely, unbelievably rare DNA sequences that exist in the world today, why is such a fundamental process capable of producing them not abundant as fuck already? Where'd it go? Why is this process even a mystery in 2020? It should be ubiquitous; in fact, all of the primordial soup mechanisms should be. Certainly that's what we expected when the theory was developed, and it hasn't panned out.
Anyway, I think we've exhausted this topic. Thanks for commenting.
> We haven't found that, and apparently aren't even close.
We do have ideas! Specifically, within the RNA world hypothesis, the transition period is called the virus world [0]
> given all the incredibly, insanely, unbelievably rare DNA sequences that exist in the world today,
We have a good understanding of where diversity comes from, I'm not sure what point you're making here.
> why is such a fundamental process not abundant as fuck already. Where'd it go? Why is this even a mystery?
I don't think anyone thinks this process need be 'fundemental', though it definitely is pivotal. It only really needed to happen once, and then DNA was off reproducing and spreading by itself. That said, it looks like viruses converting RNA to DNA could still be happening today.
In general, we don't expect novel self-reproducing molecules to arise today, because they are out-competed by existing self-replicating molecules. In a world where nothing is replicating the first replicator is king. In today's world a brand new replicator is food for something else.
> Maybe it's possible that your romantic view of how this all happens (pseudo-Darwinian circa 2020) isn't telling the whole story?
I don't think I, or anyone else really, is claiming to tell the whole story - just that we have good reason to believe this came about through physical processes, and no evidence to believe... well I'm not sure what else there could be.
Interesting discussion. I’d like to ask a sincere question:
Wouldn’t a system A that is capable of encoding another complex system B, need to be as complex in order to encode all the information in the result?
It’s like a compression algorithm, you can encode the information, but the complexity level of that information is still there (also the difficulty in compressing the information increases very fast - exponentially or maybe even factorially).
So if the most basic protein sequence requires so many bits of information, wouldn’t anything capable of producing that (in a non-random manner) also require at least that level of information (if not more).
It doesn’t matter what process we call systems A and B.
So it seems if randomness doesn’t solve the problem (because math), then the only conclusion is that there is a fundamental requirement for intentionality.
It's possible for a simple thing to encode something more complex, deterministically.
The prime example is The Game of Life - simple rules from which complex behaviour emerges.
This idea of information is one we're putting onto the system, not some inherent attribute. Yes, the encoding of a protein needs to have enough information to produce that protein (or a family of proteins), but that says nothing about the process that created the encoding.
For example, a strand of RNA can be spliced in many different ways to create many different proteins [0] and this process can go weird in many ways. New sequences will arise from this process, even though they weren't 'intended' to.
The Game of Life doesn't produce complex behavior from simple rules.
The complex behavior comes from a large enough random starting state combined with a very low minimal required complexity to see something interesting. Also, even for a short interesting run of local behavior, the game never produces a stable behavior that grows in complexity beyond the initial information encoded in the random state. (i.e. if there is a bubble of cool stuff happening somewhere on the 2d plane, something usually interferes with it and destroys that pattern - like waves in the ocean, even when the energy curves combine to form a wave once in a while, they are limited and temporary).
So the Game of Life is actually an example that the system is limited to the information encoded in the initial starting state.
In the starting state there is either:
- a large enough random search space (i.e. a million random attempts with a 100x100 board might get something cool looking)
- intentionality (a person can design a starting state that can produce any possible stable system)
Yes, and useful proteins are basically the equivalent of "oscillators" or "spaceships" in the game of life. But must runs of the game of life are not oscillators or spaceships, just like most proteins are useless.
That's why the "initial condition" is so important, and why DNA is so important: without a good "start state", you get useless results—just like in the game of life.
What we are trying to find is not Conway's rules for the game of life, but this: how do we produce useful starting states (DNA) with a physical system? And more importantly, how do we create those starting states preferentially (i.e. non-randomly)?
We still need a model for how useful DNA (which corresponds to the "initial state" in the game of life) gets created. And we have no model for that right now, other than assuming unique random initial states are continually occurring and letting the law of large numbers eventually "find" winners.
For DNA, at least, it could have come from RNA (as per the link in my last post).
While I don't think the pre-biotic problem is solved at all, we have a lot more models of how it could have happened than you seem to credit - this is after all a huge research area.
For example, here is one [0], and here is a whole journal issue on the subject [1].
I found these by searching for 'evolution of DNA' and 'evolution of RNA'.
Now, these models all include some randomness, but in no way does anyone assume "unique random initial states are continually occurring... letting the law of large numbers eventually "find" winners"
The models show plausible environments where pre-biotic synthesis of RNA (or RNA pre-cursors) can occur, and stabilise.
This model you keep bringing up - randomly selecting a molecule from all possible combinations of atoms and saying 'enough time will get you one that works' - is not mentioned anywhere that I have seen. Perhaps some lay-people (of which I am definitely one!) believe it, but as you point out it is so obviously implausible it falls down on first inspection.
There are other models (lots of them!) and they don't rely on this pure randomness.
Minor side note, but most runs of the game of life actually will produce spaceships and/or oscillators, even starting from a random configuration. (Initialize a 100 x 100 box of cells randomly, and you're virtually guaranteed to get several gliders flying off of the resulting mess.)
This assumes a perfect random distribution though.
What if amino acids and proteins are in fact likely to arise naturally and in favorable circumstances?
There was an article recently (~1-2 months) on HN about a supercomputer/AI discovering new chemical pathways for part of this process, but I can't seem to find it anymore. I think it was about forming amino acids.
I'm no expert on this subject (the opposite really, I've slept through chemistry), but my experience with large-scale simulations has been that a surprising number of them converge to the same final result given the same starting parameters even if most processes within them are perfectly random. The bigger the simulation, the more likely they are to give you stable results. And the universe is pretty damn huge.
So I like to believe the creation of the foundations of life is in fact more-or-less inevitable in our universe, in turn increasing the chance of useful proteins etc. forming.
And that kind of hand-wavey statement seems to convince most people. Universe is hella-old, and really big. Ergo, incredibly rare stuff has happened basically infinitely many times. Life everywhere, etc.
Only…it's actually not that old, we have some idea how big it is (not that big, just lots of space between atoms), and thanks to computer science, we're pretty good at analyzing issues surrounding computation complexity.
And as it turns out, the DNA-to-protein pathway is much much much less likely that our initial hand waving made it seem.
I'm not saying it didn't happen, I'm saying with our current level of knowledge we have no idea how. The math based around being old and big doesn't work. So we need better math, more studies, etc. and less hand waving.
>Ergo, incredibly rare stuff has happened basically infinitely many times.
This wasn't my argument though. In fact it was the complete opposite.
I was proposing that it was in fact likely and thus pretty much guaranteed to happen in a large universe, as opposed to being unlikely but still likely given a large enough universe.
So we're working with different assumptions here.
In fairness I put my assumption way at the beginning of my post, so it probably got forgotten about by the end of it. Quoting myself:
> What if amino acids and proteins are in fact likely to arise naturally and in favorable circumstances?
We haven't yet conclusively found all of the pathways these can arise, and we continue to discover more. People just tend to assume it's pretty unlikely. I'm not so sure.
Comparing amino acids to proteins is a category error, almost akin to comparing individual x86 instructions to a full x86 Linux kernel binary. The level of complexity increase is not just in size, but it's a fundamental different thing altogether.
The amount of information (via DNA) needed to create a useful protein from the 20 amino acids is absolutely incredible.
So…finding more potential (note: not demonstrated) pathways to create amino acids ex nihilo does literally nothing for producing viable DNA strands and proteins. DNA and proteins are a totally different problem, and we've made basically no progress at all, and the more we look at it, the less likely it seems.
And then people (not you per se) hand wave about the size of the Universe to explain the problem away. I think we should instead accept the problem exists and work to solve it.
----
Separately, we have no known examples of any natural process producing what we, as humans, would call "information." DNA is much closer to information than any other concept, to the point where if we were sent something similar to DNA from space in, say, a radio transmission, we would absolutely assume intelligent life had made that transmission.
That is, with our current knowledge, it takes something vaguely "intelligent" to product the kind of information we have in DNA. Maybe such processes exist, but this is an absolute far cry from producing amino acids from chemical precursors, which are not information-like at all (and thus, it is unsurprising that we can do it).
I have found your comments on this thread very intriguing. The computational analogy applied to DNA and proteins is apt for me. Also, this strikes me as a potential resolution to the Fermi Paradox. What do you think?
Well, they're obviously related in that we really need to discover/determine how useful DNA came to be, starting with just the primordial soup. If we can get more accurate numbers for the Drake equation, that would certainly go a long way towards explaining the paradox.
I brought it up on HN because relatively few people seem to know this is still a problem, and progress on resolving it has been slow.
The many worlds interpretation of quantum mechanics increases the combinatorial space to play in for some otherwise unlikely seed event by tremendous amounts.
Also, we know stuff like the smallest observed polymerase, but we don't know what the smallest functional one would be that could have into it.
We also have self-replicating pure RNA systems, though the components aren't abundant. But this is just what scientists came up with in one effort trying to make one to prove it is feasible:
But why assume a leap directly to proteins, by definition a long chain of amino acids? Couldn't we have started with self-replicating peptides and incremental improvements?
Peptides are just short proteins, and no, we have no idea how to get them either (though it's obviously easier).
Also, it's not that what I've called bad/garbage DNA doesn't produce proteins, it's that the proteins produced are useless: they don't "do" anything. There's no obvious reason why DNA "extension" should produce useful proteins over un-useful ones, at least, no mechanism that we have discovered so far.
Instead of accepting a theory of incremental improvement that "sounds nice", waving our arms about random chance and an old, vast Universe and going "yup, that's how it happened!", let's try to develop testable mechanisms and validate them.
I'm asking for more rigor while simultaneously shooting down "random chance", "plenty of time", and hand waving about the Law of Large Numbers. We've done the math and we need far more effective, directed mechanisms than random chance to produce useful DNA sequences.
Fascinating. And of all life on Earth, how did human consciousness arise?
We're basically virtual machines/entities/minds stuck inside biological bodies, and the majority of us are at odds with nature and every other living organism on Earth.
I guess I'm more interested in how that happened than how life started, both seem equally incomprehensible to me, though.
Large number comparisons are difficult for humans to comprehend.
If you simplify life to a DNA strand 256 nucleotide long (for the sake of math comparison) - then the search space is 4^256. To comprehend how large a search space this is watch 3Blue1Brown's explanation https://youtu.be/S9JGmA5_unY?t=38
Maybe humanity will be able to check the entire universe for life. I love imagining that scenario and wondering what the reaction would be when we find none.
This comical notion has always struck me as scientism at it's finest. There is nothing "pretty likely" about it considering a) we have no idea how abiogenesis occurs, and b) we have literally zero evidence of any form of life outside of our rock.
the second sentence does not support the last one. The last one is independent and known as the anthroposophic principle. I.e. even if it was extremely unlikely by some measure, it still happened. Whereas there's no indication whether 10e24 stars were a number far past the goal post, or relevant at all on your back of the envelope.
It rather seems that you (and the 4500-6000°C comment) were commiting to a fallacy of large numbers. You might as well write a friggin' fantastillion, unimaginable, zomg!, and you wojld still convince roughly the same gaillion number of people. But it's good to hear the details.
4000-6000 doesn't sound much at all in years for me for example, but it used to.
For what it's worth, the classic Drake Equation is that old "back of the envelope" calculation they put together to try to ask this very question - what's the likelihood life evolved?
The problem with the drake equation is it had several variables for which we had no flippin idea what the values were. For example, supposing there are a bajillion stars (we know that much), then we have to multiply against how likely it is for those stars to have a planet - and at the time, the likelihood of planets was completely unknown.
That at least, is something that's changed in the last decade, thanks to new telescopes. We've addressed one of the Drake Equation's big unknowns: we now can hazard a guess that planets are extremely likely.
Sadly there are enough other unknowns that we still can't make any sort of conclusions, but at least the betting odds are going up.
well it is, but it's an accurate response to somebody already engaged in it. It is the GP who finds it "probably unlikely" without any indication of actual probabilities, chiefly rounding down from a haphazard guess, after all.