Hacker News new | past | comments | ask | show | jobs | submit login

> Proteins are long chains of amino acids. Your DNA encodes these sequences, and RNA helps manufacture proteins according to this genetic blueprint.

What does it mean when it says “Your DNA encodes these sequences”?




Maybe I can explain this in computer terms.

Think of DNA as read-only. It's a series of molecules that represent data. Cells read the topology and molecular interactions of those molecules and determine information from them.

Now, because the DNA is read-only, the permissions are super restrictive. Which means if you want to access that data in the DNA, you need to go through an intermediate, which, when it leaves the restricted area, is ok if it gets destroyed, because that original copy is still intact. That's where RNA comes in. RNA is designed to be mutable and temporary. I think of it like system memory. Reboot the system, and you lose it all. Want to protect it? Write it to the disk (DNA)., Similarly, you don't mess with files right on the disk - first you read them to memory.

So, RNA is basically DNA that's been read to memory, and can now be messed with. You can do something like execute it, which would be analogous to translating it to proteins. The process is similar. While 0s and 1s might translate to microcode calls (i.e. physical action on part of the computer sending electrons around), RNA translates to amino acids, the building blocks of proteins, which are the physical components of cells, and do things like...well...move electrons around, among other things.

The way this works is that while 1010010101110 (I have no idea) might be some microcode call like OR or NOR, RNA bases (which are derived from the transcribed DNA) might say things like AUG, which tells the cell "OK go get the Methionine amino acid" or UUU which means "ok go get the phenylalanine amino acid". Chain enough amino acids together, and you get a protein, which essentially is a function in the cell. For instance, a protein of a specific sequence of amino acids might go fight off viruses. A protein of a different, but still specific sequence, might go produce energy.


When RNA is used to create amino acids, is only a specific segment read (if so, by what mechanism?), or does a cell create all the amino acids defined by RNA in one go?


The RNA is only a recipe for how to string amino acids together, they already float around in the cell (they are small, simple molecules, like bricks for a house to confusingly change the metaphor). Only a piece of DNA is copied at a time into RNA, like you would load only one program into memory and not the entire harddrive. The mechanism for where to start and stop is farily simple in simpler organisms - you look for a start and termination sequence ("codon") (like ASCII control characters). In more complex organisms like multicellulars, this is more complicated. There are also other types of RNA that do other things.


So RNA doesn't create amino acids - they already need to exist (and there's a lot of metabolism that explains where they come from). What the RNA does is tell a piece of cellular machinery to combine the amino acids in a specific order to make a protein.

Depending on the organism, there are a few ways to do this. If you have a simple organism, like an archaeal cell, here's how it goes:

The DNA is all one giant circle. That circle has maybe 3 million bases, or 3 megabits. Those three megabits maybe encode 4000 or so distinct protein-coding genes. There's also some space between genes. Even though it's a circle, it is directional because the molecules connect in a specific way (5' OH-3'OH but that's just details).

That DNA is sequences of four bases, which we represent by the first letter of their human names - A for adenine, T for thymine, G for guanine, and C for cytosine.

If we wanted to denote a stretch of DNA that goes Adenine-adenine-guanine-adenine, we'd write AAGA. The cell obviously has no idea that's what it is, but it can read the topology that stretch of DNA would make. i.e., the cell recognizes the shape of that specific sequence of DNA.

OK, so we have one giant "file" with 3 million A, T, G, and Cs in there. Inside that file, there's roughly 4 thousand functions (proteins) that we can check. And just like humans, cells have developed syntax.

To get just the RNA we need, i.e. to call a function, the cells look for the following DNA sequences (they all vary a little depending on what organism):

The BRE: CCCTCC. A specific protein called "Transcription Factor B" recognizes this and grabs onto it.

The TATA box: TTAAAATTA. A specific protein called TATA-binding protein will bind this.

The BRE and TATA box are a little bit in front of each gene. So they appear some ~4k in the genome, before each protein-coding gene. (this is simplified but you get the idea). The job of those spots is to be bound by their partner proteins, and those partner proteins then will instruct the cell to copy the corresponding gene (usually right next to them) into RNA. Once you get to the end of the gene, there's a terminator sequence[0] which the proteins that are copying the DNA into RNA get blocked by, and can go no further. So that's how you get just the RNA for your gene.

Now - the RNA can go to the ribosome to be "translated" into the protein. This is where the triplets come in. The first triplet is usually AUG, which codes for the amino acid Methionine. That's called the "start" codon. However, imagine our RNA sequence looks like this:

AUG CAA ACC AUA CAA GUA UCC AAA ACG GAG CUG AAG UCC CUC GCU

That should code for the following protein, where each letter indicates an amino acid:

MQTIQVSKTELKSLA

But we have a problem. How on earth do we make sure that the ribosome starts at AUG? What if it instead started at UG C, essentially "slipping" by one base? Then, our protein would be totally different, and would look like this:

CKPYKYPKRS.SPS

So organisms also transcribe, at the beginning of the gene, a "ribosome binding site", or "Shine Dalgarna/Kozak" sequence. That sequence looks like this:

AGGAGG

So your whole gene now looks like this in RNA form:

AGG AGG AUG CAA ACC AUA CAA GUA UCC AAA ACG GAG CUG AAG UCC CUC GCU

The AGG AGG makes sure that you bind the RNA in the right spot, and start reading from the "start" codon (AUG) until you reach the end of the transcript.

[0] The terminator sequence is tough to conceptualize, and not every gene uses it, but essentially its a bunch of self-complementary bases that when the DNA is unwound for transcription to RNA, they knot back up on themselves so the proteins that are converting DNA to RNA can't go any further. See the image on this page:

https://parts.igem.org/Terminators


> That circle has maybe 3 million bases, or 3 megabits.

Nitpick, but isn't 3 million bases equivalent to 6 megabits, since each base has four possibilities, thus representing 2 bits of data.


Yes, I'd agree with that


Wow!!! Really good explanation. Thank you so much


so I can look at them like DNA (source compiled) RNA (IR AST) Protein (executable Program)


RNA is definitely not like an "abstract syntax tree" or "intermediate representation". It is the final machine code that is executed by ribosomes to build proteins. Proteins aren't really "executable programs", they are either passive building blocks or simple tools that do the same thing over and over again.

There are "programs" implemented by proteins and other components at a higher level (even things like clocks), but trying to include everything in one metaphor will stretch it to the breaking point.


Nice one. Thanks!


DNA is like the source code for protein. That's its function. Specific triplets of bases along the chain correspond to specific amino acids, so a DNA strand ultimately results in production of a specific protein molecule. In turn, protein molecules do most of the specialized work in a cell, including controlling the intake and expulsion of small non-protein molecules from the surrounding medium.

In turn, it's mainly the shape of protein molecules that makes them behave differently. And some proteins change shape under changing conditions within the cell, thus being able to function in a regulatory fashion.


Based on the animation above - basically a punch-card system.


Cool video that explains how RNA is used to make protein.

https://dnalc.cshl.edu/view/15501-Translation-RNA-to-protein...


Animated explanation - https://youtu.be/5MfSYnItYvg




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: