I am not moxie, but I'll reply. The key thing is that a short authentication string is shown to both parties. Then it is the job of the parties to also verify by talking to each other that they are talking to the right person (authentication part). That is the key part -- using some other channel (not just he algorithms and the protocol to authenticate the counter-party).
Then the boring part kicks in and the way the algorithms works is that if you are convinced that you are talking to the right person, then the way the secret key is generated won't be infer-able to an eavesdropping party.
Now, if I remember correctly NSA has published a paper where they allegedly fooled this system by using voice disguise and a voice actor to basically play a man in the middle attack. Now this is highly subjective and based on the context, so take that with a grain of salt.
Then the boring part kicks in and the way the algorithms works is that if you are convinced that you are talking to the right person, then the way the secret key is generated won't be infer-able to an eavesdropping party.
Now, if I remember correctly NSA has published a paper where they allegedly fooled this system by using voice disguise and a voice actor to basically play a man in the middle attack. Now this is highly subjective and based on the context, so take that with a grain of salt.