Hacker News new | past | comments | ask | show | jobs | submit login

I've been wondering for some time what the next step is for gene sequencing. Shouldn't someone be working on a 50 year long study where people are brought in at age 20, genes sequenced and dna stored, and an extremely thorough battery of parameters (height, length of each long bone, eye color, etc.) is measured. Tie everything in with medical history. Do the same thing every 10 years for each subject until death.

With a large enough sample pool, we'd be able to correlate features with obscure genes, wouldn't we? Am I missing something fundamental?

It seems like now is the time to get started.




Such a program is a major part of President Obama's Precision Medicine Initiative. The NIH is funding 5 research pipelines throughout the country (4 at universities, 1 at the NIH) with the intent of collecting the full genomes and accurate phenotypes of one million individuals, with ability to recontact for further information if needed. The NIH just announced the grant winners two weeks ago so the grant recipients are just starting to work on the full protocols/pipelines now.

As someone who works in the field, the lack of phenotypic data available accompanying genome sequences is currently one of two major roadblocks in advancing genomic science (money is the other problem, of course).


Do you have a link ?




I think the fundamental thing you're missing is that a gene's behavior throughout your life depends on more than just its sequence. Everything else that goes on in your body, and in particular everything that goes in and comes out of your body, can impact when genes are triggered to form proteins, how those proteins fold, and what effect the proteins have.

At best, gene sequence data like you describe could provide rough probabilities for some health conditions. That's as likely to lead to over-treatment for a non-problem as catching a real problem early.

Sure, there would be some cases where everyone who has a certain sequence winds up with the same disease, and it'd be good to discover those early in life. (Ex: my wife was born with Wilson's Disease, which causes liver failure by early adulthood 100% of the time without treatment.) But those cases would probably be very few compared to the number of people who get freaked out by things that might become an issue, but probably won't, and the cases where the correlation is too weak to provide a warning, but a health problem occurs.


I don't think the value is directly diagnosis but rather an increased understanding as to the genes' contributions.


Yeah I've always thought of genes as like dynamic code, it's hard to tell what's going on until you run it


I used to work for company that make DNA sequencer machine as embedded SW dev ~20 years ago.

I think the genes is more like static code. One can clone it easily. RNA is like the stack, RSS, Register states - expression of the genes.

DNA is like a class, RNA is the object that is instantiation of the class.

BTW, the double helix structure is very much like full unit test coverage for the bio-programming.


I think a great example of gene expression based on environment is that of the himalayan rabbit. Cold temperatures trigger black fur growth whereas warm temperatures allow the fur to be white.


Domestic cats are also good examples of two genetic phenomena:

Siamese cats can have white fur but dark paws and face, it's because of a temperature sensitive mutation enzyme.

Tortoiseshell cats are usually female, and the patterns in their fur are a result of X-chromosome inactivation happening in random clumps of cells.


While it may seem esoteric to some, I believe significant effort should be put into the sequencing and analysis of dogs genetics. Their selective breeding is truly the greatest genetic experiment ever performed: from to Chihuahuas and Great Danes to Scent Hounds to Sight hounds to retrievers to herding dogs. We have crazy-detailed knowledge we have of them including their behavior, differences in morphology (e.g.,size), diseases (cancers, neurological disorders, musclo-skeletal disorders, metabolic diversity). Genetically, an amazing range of diversity exists, with subsets in a spectrum from extremely inbred breeds to totally outbred mutts scatter throughout the world. This diversity dwarfs that of humans, with nearly none of the informed consent and data privacy challenges that will-and should--go with a human study. There are legions of fanatical dog owners that are easy to find that would likely be enthusiastic in having their pup participate. While I sit here with mutt at my feet, my primary motivation is the correlation of genetic variations with diversity in behavior, stature, health, and disease. I know good work is already being done in this area, but a lot more can be done with relatively modest resources compared to what is required for human studies, and the results would provide an amazing benchmark by which to help interpret the human data as we collect it.


Dogs look different, but in terms of DNA they are very similar. Skin / fir color and size are things DNA optimized to change as quickly as possible which is why Chihuahuas and Great Danes can still interbreed.


The more similar they are genetically, the easier it should be to define and characterize the differences that are responsible for morphology, behavior, and diseases that run among breeds, no?


Kaiser Permanente has just such a project: http://digitalrepository.aurorahealthcare.org/jpcrr/vol2/iss...


How is that different than sequencing genes from individuals and correlating that to their (fairly detailed) recorded medical history? If I'm not mistaken, that's what's currently being done.


There's a lot of value on doing this, but GP idea has its value. For starters, there's some survivorship bias, as you cannot map people who have already died.

You may also want to gather more detailed data that the medical history usually conveys, such as sub-clinical conditions and other non-pathological differences.


> With a large enough sample pool, we'd be able to correlate features with obscure genes, wouldn't we? Am I missing something fundamental?

Nature vs Nurture.

You'd find a huge sampling error because of epigenetics. The same genes in different people don't always kick in and do something.

However, finding out which gene + which life-style == bad stuff might still work out as a possibility.

Even those might not be repeatable in the future - I grew up running around in leaded gasoline land and the same genes will probably never have to deal with so much lead in the air in my kids. Does it matter that I might have special tolerance genes which protect my brain lead fumes?

The study would throw up interesting results, but not "feature -> genes", but maybe "genes -?-> features" (necessary but not sufficient).


>You'd find a huge sampling error because of epigenetics.

I'm skeptical that, just now that we know of epigenetic effects, they will invalidate all previously existing roughly-Mendelian genetics. It's not like people's eyes change from brown to blue if they're stressed in childhood.

So, you wouldn't necessarily find huge sampling error, you'd find sampling error proportionate to the epigenetic effect on that particular gene. Is that large? I suspect not for many genes we'd be interested in.

>Does it matter that I might have special tolerance genes which protect my brain lead fumes?

I don't think this is how epigenetic effects work.


yeah, you just invented GWAS (genome wide association studies), something done on huge scale but obviously you always need more data and better features.

It' so wide spread to be considered a belief on its own: http://www.theallium.com/biology/us-government-to-officially... ;)


The military has databases like this that could lead to a lot of interesting research, in theory. They do DNA testing for body identification purposes as well as tracking long term health through the VA.


Yes and no.

What you describe would be useful, but the scope of its usefulness is likely far less than you might think. Basically, you'd only be capturing some long term effects that have very high effect sizes / strong correlations. The sheer complexity, variability, and size of the dataset means that you won't be able to pull out subtle interactions, even with a large number of individuals. Barring fundamental advances in GWAS methods, you're looking at huge expense for very little benefit.

Now, collecting and organizing disparate datasets that are already being generated is worthwhile. Comparison is difficult due to the different methods used, but a good investigator will be able to do a lot of in silico work to, at least, do some preliminary work on hypotheses. On a personal note, this is actually a lot of fun because you can often come across from very exciting hints to follow up on, all with a quick feedback cycle conducive to 'flow'.

Ultimately, and this gets back to science fundamentals, this would amount to a big fishing expedition. Good science will always be strongly hypothesis driven. Good bioinformatics is built upon good datasets, and that only comes from very careful hypothesis consideration, solid methods, and a good understanding of the biology involved.


Besides the actual sequencing this is what LifeGene http://lifegene.ki.se/ does. I.e. long study with check ups every 10 years, lots of tests and lots of data about person's environment, living conditions, etc.



There are many rare variants in each individual. Establishing the significance of these variants in a particular environmental context, and dealing with the myriad of potential interactions with other variants is probably not tractable with the experimental design you describe.

High throughput phenotypic screens which test the cellular affects of inserting particular variants could help, as they would prune the number of potential variants, but the false negative rate is high.

Genomic determinism may not exist in a useful sense in the current era.


Associations between genetics and traits (such as diseases, height, etc) are already very common in science. One of the main problems is that you need extremely large sample sizes to see any effect. These kind of studies are called genome-wide association studies. Many common traits need hundreds of thousands of individuals if the genetic link is suspected to be weak (which is the case for most common diseases), making such studies extremely expensive, not to mention time-consuming.


Various national level projects have begun. The Scandinavian countries have begun taking biological samples for later sequencing, which can then be cross-referenced with their fantastic population-level databases. The UK Biobank has been doing that for hundreds of thousands of UK people, and has been responsible for many important recent genetics papers. The Obama administration has been funding a similar project which is planning to collect 1 million genomes and has gotten press recently.



Well there is the Dunedin Study [0], which is similar to what you're talking about.

https://en.wikipedia.org/wiki/Dunedin_Multidisciplinary_Heal...


The US military is doing this with its personnel.

The US military got stung very badly in Desert Storm I with people coming home sick with no way to see what went wrong. They decided not to repeat that mistake.


I feel like problem intractability would be a hol' up.. on the other hand I just remembered that 23andMe is sort of attempting this


23andMe is not currently sequencing individuals.


No, but their genotyping covers a significant number of SNPs, and they do their own in house statistical research.

Disclaimer: I interviewed at 23andme a few years ago, and have purchased their services for my entire family.


Only the exome, which is ~1.5% of the whole genome. Source: http://www.yuzuki.org/the-whole-exome-vs-whole-genome-sequen...


Results from that data alone: https://www.23andme.com/for/scientists/


True, though they do let users participate in genetic surveys that are aggregated and correlated for whatever end.





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: