Hacker News new | past | comments | ask | show | jobs | submit | amrb's comments login

It's a red flag that the 1.2bil model has to fit in gpu memory, happy to be provided wrong when the code drops


That's not something that DisTrO solves, but there's plenty of research in that area! See https://arxiv.org/abs/2301.11913 , https://arxiv.org/abs/2206.01288 , https://arxiv.org/abs/2304.11277 etc :)


An alternative approache to BPE tokenization https://arxiv.org/abs/2406.19223


T-FREE is interesting, at least, I find it interesting in that I don’t really understand it. They take successive character triples of all words, and then hash them, and then use the hash table slots landed in as destinations to feed into an embedding space? Can I possibly be understanding that chart properly?

Can you explain this any better than the first few pages of the paper? I’d like some intuition about why T-FREE works; there are lots of reasons to prefer different tokenization schemes, but I can’t really get this one into my head from the paper, unfortunately.


Can't say I mastered the concept either, I'm waiting for the code [0] to be release so I can run some head-to-head tests.

[0] https://github.com/Aleph-Alpha/trigrams


Can check out their project at https://github.com/bigscience-workshop/petals


Speaking of quantized vectors https://huggingface.co/papers/2309.14717


This paper had nothing to do with product quantization.


And they worry about LLMs hallucinating


Great project and I'm happy to see it expand to more models!


What's the new reddit to try?


Anything can end up in logs, then it depends on getting access to hosted splunk via employee creds, for a hypothetical breach.


There is a salary requirement, as not to under cut local works. Of course if you working over 40 hours a week maybe the company gets it's pound of flesh!


You get paid higher if you rent your apartment on a month to month lease vs signing a 2 year lease.



This is a great in depth and sober analysis.


So we just recreated all of the previous SQL injection security issues in LLM's, fun times


SQL injection is due to sloppy programming practices and easily avoided. Using something called query parameters.

This is another beast!


It's much worse actually because its extremely hard to even figure out if you have a security issue because it involves NLP.


This is worse because "prompt injection" is a feature, not a bug.

If you want a generic AI to talk to, then whatever you talk it into - such as rules of behavior, or who to trust - someone else will be able to talk it out of. Just like with humans.

Others mention the problem is lack of separation between control/code and data - technically yes, but the reason isn't carelessness. The reason is that code/data separation is an abstraction we use to make computers easier to deal with. In the real world, within the runtime of physics, there is no such separation. Code/data distinction is a fake reality you can only try and enforce, with technical means, and it holds only if the thing inside the box can't reach out.

For an LLM - much like for human mind - the distinction between "code" and "data" is a matter of how LLM/brain feels like interpreting it at any given moment. The distinction between "prompt injection attack" and a useful override is a matter of intent.


And what happens if your application does not handle the LLM response correctly (buffer overflow anyone)? Yep your own LLM will attack you.

Get your popcorn ready, remember the silly silly exploits of the early 2000s? We are about to experience them all over again! :D


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: