Yes! https://github.com/ggerganov/llama.cpp/blob/master/grammars/README.md Its a...

FieryTransition · on Sept 28, 2023

Does that mean you could make any model at least always produce syntactically correct code as output?

brucethemoose2 · on Sept 28, 2023

Yes. The model has no choice, as the syntax is applied as the model is "picking" probable output tokens, not parsing complete output like some other methods.

Though its possible it would output syntactically correct nonsense.

FieryTransition · on Sept 28, 2023

That's very awesome, I feel like it would be fun to make a feedback loop with the LLM outputting syntactically valid programs which also log the status of various invariants and pre/post conditions as the program runs to validate correctness and maybe even train a LLM this way.

Would be interesting to see how far it would get writing programs, and wrt. the problem of stopping too early on a token, it could resume with some extra context from where it stopped.

Maybe construct program such that it is made by building blocks which can fit into context, such that each block/function has a pre/post condition and invariants are inside it, which the LLM will try to test against each time it runs. I think I just found my next side-project, I know there are many similar tools, but I haven't seen anything trying to couple the compiler, running the program, creating pre/post and invariant checking against emitted code yet.

Would be interesting to test this hypothesis to see if the llm can actually build a program this way. I think about it like humans have a short-term memory as well, and for each sub-structure of a program, the llm would have to work in a similar way. Then reserve a bit of context to control the long-term memory for general goals, or create something like an AST of thoughts which it would recurse through, while reasoning about it.

amilios · on Sept 28, 2023

How does this actually work though, since the model could e.g. abruptly end the generation giving something syntactically invalid? Doesn't it need to look at the whole output at some stage?

brucethemoose2 · on Sept 28, 2023

All generated output depends on all the previous output. The model "looks" at (mostly) everything every time.

Generation never actually stops, the model just emits a special stop token when stopping is most likely next token. Hence the grammar implementation can prevent this stop token from being emitted prematurely.

There was some discussion of models getting "stuck" where there is no syntactically correct token to emit. Some proposals included a "backspace token" IIRC, but I dunno what they actually did. You can look through the discussion in the PR.

amilios · on Sept 29, 2023

Oh yeah that's true! Just block the stop token. But yes, my thought is that there are scenarios where it can get "stuck" as you said. I'll look at the PR, thanks!

uoaei · on Sept 28, 2023

At every generation step you mask the tokens so that only syntactically valid outputs are possible. This includes the <end> token.