When he talks about performance, he mentions that memoization allows performance to not suffer very much, but what he doesn't mention that almost the entire performance gain comes from memoizing frequencies... which stores the hashtables anyway; it just does it behind the scenes. This is fine for CPU usage, but it does have a significant storage penalty depending on the characteristics of the input.
There's some low-hanging fruit to cut down on the required storage, namely wrapping to-words and frequencies together, and memoizing that. This doesn't reduce the number of hashtables, but at least there's no need to store the entire word lists. As it stands, this will simply fail if your corpus is too large to fit in your RAM all at once.
Right, I realized after writing that that I was forgetting about that one. Still, memoizing frequencies takes it from O(n^2) to O(n) right now, so it's easily the second largest increase.
There's some low-hanging fruit to cut down on the required storage, namely wrapping to-words and frequencies together, and memoizing that. This doesn't reduce the number of hashtables, but at least there's no need to store the entire word lists. As it stands, this will simply fail if your corpus is too large to fit in your RAM all at once.