For starters, you can't run this kind of emulator on a per-frame basis (i.e. run all the game code for one frame, then wait for vsync on the host system, then repeat) if you want sound. Games rely on running at a certain clock speed and they generate sound by writing to the registers of the sound chip. They sometimes do it more frequently than the refresh rate of the screen. You have to sync with real time much more often than once per frame for everything to work as intended.
That doesn't make sense. You're confusing inside-emulation time with outside-emulation time. It's entirely possible to run an emulator for a frame at a time and get the sound right; the emulator just needs to run things in the right order or with the right internal synchrony.
It's also entirely possible to run an emulator at many times real-time and get the sound output correct, however you use it. You can use this to quickly get wav/mp3 copies of game music, for example.