Any nonlinearities in the Fourier transform process (of which there are presumably many) will manifest as harmonics of the beat frequency. I'm on my phone right now, but if you take two simple-ratio tones, add them, apply a nonlinearity (like exponentiation to a non-1 power), and then Fourier transform, you will see the beat frequency and its effects. I imagine (but do not know) that this is the physiological mechanism for high-frequency beat frequencies.
For binaural beats with low-frequency tones, I can only guess, but you can actually reconstruct phase information from the frequency domain signal depending on your sampling period. As a trivial example, imagine f << 1/T; now you can treat the DC component of the Fourier transform as a time-domain signal that contains f. I imagine that's how binaural beats work, as they occur at sufficiently low frequencies for this to happen. It could also be that the ear transmits low-frequency time domain information as well as frequency-domain information.
For binaural beats with low-frequency tones, I can only guess, but you can actually reconstruct phase information from the frequency domain signal depending on your sampling period. As a trivial example, imagine f << 1/T; now you can treat the DC component of the Fourier transform as a time-domain signal that contains f. I imagine that's how binaural beats work, as they occur at sufficiently low frequencies for this to happen. It could also be that the ear transmits low-frequency time domain information as well as frequency-domain information.