It is interesting that Japanese, Russian and Thai benenefit more (30 %) from brotli, than latin languages (25 %). This is because of the utf-8 context modeling in brotli.
I think the feature in question is declared in the source code here [1]. The RFC goes into some detail about what this means [2] and how it's used [3]. I'd love a whitepaper but the RFC is fairly descriptive and is the best source I can find.
The first draft of the article actually had that reason, but there is also a strong correlation between the size of the dict (these dicts are almost 1Mb, while other languages are closer to 500kb) and compression ratio improvements. Therefore I've played it safe and attributed it to the window size.
Though for languages like Korean and Chinese (whose size is more inline with latin languages) we see 27.5% improvement, which is most likely due to context modeling.
Therefore I assume ratio improvement is split ~50/50 between these two. It was easy to verify that by compressing data with `brotli --window 15` and comparing ratios there, but I was lazy there. I'm sorry.
PS. I've also skipped NFC/NFD part of the post which is very interesting for Korean, where NFC normalized text occupies 30% less space. It also gives additional ratio 5% for brotli and 15% for gzip.