Really simple explanation is that for inference, feed forward networks are thres...

Really simple explanation is that for inference, feed forward networks are threshold circuits and by their nature ANNs are binary output, outputting true and false (same as being a threshold circuit)

So if you train your models with that in mind you're weighs can be reduced to -1,0,1 reducing the space complexity.

I don't think the costs in expressiveness are captured quite yet, but as perplexity doesn't care about correctness, if that is the metric that is important for you it will probably reduce memory requirements for inference.