Hacker News new | past | comments | ask | show | jobs | submit | gilleain's comments login

Just in case it wasn't a typo, and you happen not to know ... that word is probably "eke" - meaning gaining (increasing, enlarging from wiktionary) - rather than "eek" which is what mice do :)

hah you're right on the spelling but wrong on my meaning. That's probably the first time I've typed it. I don't think LLMs are quite at the level of mice reasoning yet!

https://dictionary.cambridge.org/us/dictionary/english/eke-o... to obtain or win something only with difficulty or great effort


Ick, OK, ACK.

Oh it's completely expected that bacteria create antibiotics - it's part of the low-level chemical warfare that bacteria carry out against each other. Plants as well are chemical factories ('secondary metabolism') that produce all sorts of crazy compounds to kill each other off, as well as insects.


Since it is down, an alternative MOTM is the one from the PDB:

https://pdb101.rcsb.org/motm/motm-by-date

More protein oriented, however.


For quick reference, BLAST refers to the 'Basic Local Alignment Search Tool' that's a commonly used part of the bioinformatics toolkit. You 'BLAST' sequences by sending a query sequence of interest against a database of other sequences to find similarity hits.


I have been out of the field for some time, so I am not sure how much BLAST is used these days.

Therer was a time when BLAST-ing a DNA and protein sequence you have is like doing a Google search on it: it simply tells you where the sequence might come from. This is useful especially when your research is to figure out what that specific sequence is doing. It won't give you the answer immediately (otherwise why bother doing the research at all), but it certainly gives context: sequence similarity often hints at similar / related functions.

As an analogy: imagine if StackOverflow is suddenly down and you don't know *if* it's going to be up again.


My sibling is a molecular biologist working in the industry and they do use BLAST data. She's been telling her company for months they need to secure access with an alternative source or offline backup, hopefully their software team started it in time.


Everyone can set-up their own blast database. Usually if you are specialized on a certain species you have your own DB cached in memory somewhere locally for efficiency. Also there are alternatives. NCBI blast is just one of many. Also all the sequences are globally kept and in sync in different regions of the world, so if one Datacenter goes down you still have the option to use the exact same data from Europe or Japan and so on.


Yup, her company's software was set up to only use NCBI and she's been warning that that was a risk :)


Fair, and to be totally clear, even when I was in the field (an age ago), sequence stuff was never really my thing. However, sequence comparison is a fairly fundamental tool.

Of course, yes you can run these things locally, other providers (such as EBI Europe and Japan) have them, etc. It's still a bad sign on the pile of other bad signs, IMO.


Not a professional, but still use it like that. They also have a new smartblast thing, which works much faster (really, really like Google!) but only on highly similar proteins.


One neat thing I know about HCN and early chemical evolution is that some DNA bases can be created from HCN polymerisation.

https://pubmed.ncbi.nlm.nih.gov/31491/

Like if you look at adenine, it is 'just' N-C-N-C all the way round.


That was one of the first ones I thought of as well. Not sure why but the line "I can't be worrying about every little thing!" stuck in my mind.


So might it be useful to have some mechanism to check if the 'maintainer' (owner/principal committer/?? - what Peter Murray-Rust used to refer to as the 'Dr Who') changes?

Like, when bumping the version on a dependency, the security system could check if the maintainer has changed, then you could go and double-check any changes.


We used to meed physically 15 years ago to exchange pgp keys, building verifiable chain of trust.

Its depressing to see these efforts ignored nowadays and the consequence being we still cant trust anyone online.


I assume there is also a black market for mature GitHub accounts. So you won't necessarily know if the maintainer is now a different person.


Good point.

Also, where would the information be stored? If it was in the repo itself (as metadata) then the malicious maintainer could just not update it ...


Apart from the obvious privacy issues, I'm not actually sure what putting DNA in an LLM would actually achieve.

It seems optimistic to imagine that you could query this DNA data in combination with the other information to get meaningful answers.


Your DNA is just a rich mine of new data points about you that they can exploit and use against you. Maybe AI thinks you're more likely than most people to have medical issues and as a result companies now refuse to hire you. Maybe AI decides that your DNA makes you more susceptible to certain addictions or alcoholism and companies selling addictive products target you relentlessly as a result. Maybe your DNA makes AI think you're more prone to criminal activity and the police harass you endlessly until you either end up in prison or move away.

The conclusions AI reaches don't even have to be scientifically valid, they could be nothing more than hallucinations, but that doesn't mean that it won't have impacts on your life or place limitations on your opportunities.


My problem with this is that people like Larry Ellison are more likely to want to use this against other people but would excuse themselves from any consequences.


Oh totally agree that would be the real-world outcome of using the data.

I was just thinking out loud what the theoretical benefits could be, if done 'properly' let's say.

Alternatively we could just avoid doing it at all! It's a horrible idea, absolutely.


LLM on just DNA seems to be useful: https://www.nature.com/articles/s42256-024-00872-0

These models have proven to develop incredible abilities through pattern matching on massive text data, so I wouldn’t be too quick to dismiss the limits of what they could do.

Having them use specialized tools would probably be more effective (e.g. have the reasoning LLM use the DNA LLM), but in the long term with scale… who knows? The bitter lesson keeps biting us every time we think we know better.


The DNA idea and the LLM idea are separate things. The DNA idea is about controlling people. The LLM idea is about hyping an insanely computationally intensive technology with tons of hype and generalized excitement but few (no?) successful commercial applications. The man sells computers. If he can convince the government it's a strategic imperative that we do the AI real good, he just sold a whole lot of it. It's Web 3.0 2.0.


Hah I did something similar around the same time, using random white/gray/black pixels (apparently, I don't remember any details any more!)

https://gilleain.blogspot.com/2008/11/chalky.html


That looks awesome!


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: