Oh hey! :) TLDR naively gradient accumulation was over-weighting short sequence ...

ejddhbrbrrnrn · on Oct 18, 2024

Is this a general issue rather than unsloth specific. How wide is this problem? Sounds wild if it has been affecting everyones training.

danielhanchen · on Oct 18, 2024

Unfortunately it's not an Unsloth issue but a general issue affecting nearly all trainers which use grad accum. We worked with Huggingface so their trainers should be fixed now though in the main branch