Hacker News new | past | comments | ask | show | jobs | submit login

Yes!

https://github.com/ggerganov/llama.cpp/blob/master/grammars/...

Its actually better than a specialized model, during token generation it constrains the possible output tokens to an arbitrary grammar (like, say, JSON syntax). So it will work "perfectly" with any model with a basic understanding of the format.

Kobold.cpp and text-generation-ui already support this, and both will run on your mac.




Does that mean you could make any model at least always produce syntactically correct code as output?


Yes. The model has no choice, as the syntax is applied as the model is "picking" probable output tokens, not parsing complete output like some other methods.

Though its possible it would output syntactically correct nonsense.


That's very awesome, I feel like it would be fun to make a feedback loop with the LLM outputting syntactically valid programs which also log the status of various invariants and pre/post conditions as the program runs to validate correctness and maybe even train a LLM this way.

Would be interesting to see how far it would get writing programs, and wrt. the problem of stopping too early on a token, it could resume with some extra context from where it stopped.

Maybe construct program such that it is made by building blocks which can fit into context, such that each block/function has a pre/post condition and invariants are inside it, which the LLM will try to test against each time it runs. I think I just found my next side-project, I know there are many similar tools, but I haven't seen anything trying to couple the compiler, running the program, creating pre/post and invariant checking against emitted code yet.

Would be interesting to test this hypothesis to see if the llm can actually build a program this way. I think about it like humans have a short-term memory as well, and for each sub-structure of a program, the llm would have to work in a similar way. Then reserve a bit of context to control the long-term memory for general goals, or create something like an AST of thoughts which it would recurse through, while reasoning about it.


How does this actually work though, since the model could e.g. abruptly end the generation giving something syntactically invalid? Doesn't it need to look at the whole output at some stage?


All generated output depends on all the previous output. The model "looks" at (mostly) everything every time.

Generation never actually stops, the model just emits a special stop token when stopping is most likely next token. Hence the grammar implementation can prevent this stop token from being emitted prematurely.

There was some discussion of models getting "stuck" where there is no syntactically correct token to emit. Some proposals included a "backspace token" IIRC, but I dunno what they actually did. You can look through the discussion in the PR.


Oh yeah that's true! Just block the stop token. But yes, my thought is that there are scenarios where it can get "stuck" as you said. I'll look at the PR, thanks!


At every generation step you mask the tokens so that only syntactically valid outputs are possible. This includes the <end> token.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: