Plus, this specific use case is also "to detect legally relevant text like license declarations in code and documentation" so I guess they really bought into that regex adage about "and now you have two problems" and thought they'd introduce some floating point math instead
And yet so many people on HN are adamant that the more tokens, the better, and it's all just a matter of throwing more money at it, and it's inevitable it will somehow "get better", because there's "so much money riding on it".
They might as well call all models 4B and smaller after psychedelics because they be hallucinating.