Hacker News new | past | comments | ask | show | jobs | submit login

Rather disappointingly, neither sequence includes the string 'GATTACA'

A given combination of 7 bases has a probability of occurring of 1/16,384. Since the COVID genome is about 22k bases long I guess you have pretty good chance of it appearing in there somewhere. This assumes uniformity, which of course is not true. COVID’s genome is under crazy intense selection pressure!

The sequence GATTCA appears 4 times in the reference version of the COVID genome :) (Go to https://www.ncbi.nlm.nih.gov/nuccore/NC_045512, pick "Find in this Sequence" on the right)

AWESOME! I just did a text search for it on github. Maybe it didn't pick any up that had a line break in the middle.

I'm much happier now.

Yep, the usual coding tools aren't ideal for bioinformatics. We have our own set of tools that work well with the various "standard" formats for sequence data.

That would have been a killer easter egg (possibly literally).

Whats special about this string?

It´s the title of a cult film: https://www.imdb.com/title/tt0119177/

Very clever title. It was only much later I understood what it meant.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
