Your hardware already supports addition and subtraction and the tensor cores of NVIDIA GPUs are already fast enough to keep up. The only benefit is reducing memory capacity and bandwidth requirements.
(Okay probably those are not ready to be used as NN weights if the activations are not binary too, but... the gap to what CPUs already can do is getting smaller.)