It's the default most places. Especially with larger training corpora or vocabul...

It's the default most places. Especially with larger training corpora or vocabularies, the negative-sampling tends perform better - & I don't recall notable situations where the HS mode is better.

(I recall seeing some hints in early word2vec code of an HS-based vocabulary that wasn't based on mere word-frequency, but some earlier or perhaps iterated semantic-clustering steps, that I think managed to give similar words shared codes. But I've not seen more on that recently.)