You have to use a decent acoustic model - not the one in the demo. If you do I think it works 'pretty well' as a proof of concept. That said I'm not recommending Sphinx as a recognition framework, it is way behind the times in 2016, but this is the only 'in the wild' demo of this I've seen on the web, so I felt it was worth mentioning.
Maybe it just wasn't trained well enough to reject non-number inputs, but.. yeah doesn't exactly change my experience that Sphinx is awful.