Close to nobody works on forth compilers nowadays, and the compilers that are optimising or even fast is very small.
People say that forth isn't very optimisable for our register machines, but I reckon that you can get pretty good results with some clever stack analysis. It's actually possible to determine arity statically if you don't have multiple-arity words, which are very rare. That allows you to pass arguments by register.
Anyway, I'm not even close to an expert so don't take what I said as facts.