Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That's pretty much how modern genome assemblies work. We currently do not have a reliable or accurate way of sequencing DNA longer than ~200 bases. So we fragment the DNA into small pieces, sequence them and then put it all back together.

It's not a completely solved problem as there are features of genomes that can make this process difficult or complicated (repetitive regions, highly heterozygous organisms, etc). Especially with short sequences.

We have technology right now that can allow us to sequence long fragments, but at lower quality and accuracy. There are a lot of tools out there that uses the longer, but lower quality sequences to scaffold the shorter, higher quality sequencing data.



Illumina sequencers (highest throughput per $) have a (paired end total) read length of 300 but we can sequence much longer with other tech.

Ie Pacbio - 10-15kb

Sanger sequencing, used for the human genome project 20 years ago is over 500 bases




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: