The FP16 7B version runs on my Ubuntu XPS with 32GB memory, ~300ms/token. 13B also works but results aren't really good (the model will loop after a few sentences) so parameters probably need tuning.
So far I'm unable to reliably generate outputs in a different language than English, the model will very quickly start to translate (even if it's not asked) or just switch to English.
So far I'm unable to reliably generate outputs in a different language than English, the model will very quickly start to translate (even if it's not asked) or just switch to English.