Václav from Kyutai here. Thanks for the bug report! A workaround for now is to chunk the text into smaller parts where the model is more reliable. We already do some chunking in the Python package. There is also a more fancy way to do this chunking in a way that ensures that the stitched-together parts continue well (teacher-forcing), but we haven't implemented that yet.
> Is this just sort of expected for these models? Should users of this expect only truncation or can hallucinated bits happen too?
Basically, yes, sort of expected: we don't have detailed enough control to precent it fully. We can measure how much it happens and train better models, but no 100% guarantee. The bigger the model, the less this happens, but this one is tiny, so it's not the sharpest tool in the shed. Hallucinated bits can theoretically happen but I haven't observed it with this model yet.