I have a lot of questions myself that I hope to get clarified on August 5th, but I think the goals of Unicorn Engine and QEMU are quite different.
From the sample snippet, it looks like Unicorn Engine provides an interface that will help at reverse engineering tasks and building other tools that require the translation+execution of certain functions or blocks of instructions of another architecture (it reminds me to the disassembly framework 'Capstone Engine' in that regard). In the other hand, QEMU's goal is rather the emulation of an entire system and different audio/video hardware (basically, a virtual machine).
Could it be used as the the CPU emulation component of a compatibility layer? For instance, if someone were to create a WINE-like Classic Mac OS compat layer, could Unicorn serve as the PowerPC emulation portion? Or is the intended use case something entirely different?
Regarding performance: An important question would be how is the target machine code being emulated. From the scarce information we have so far, I think they are using an interpreter, which would be too slow for most applications. With a JIT (or even AOT) compiler this scenario would be more realistic.
Then, there is the question of what does Unicorn Engine do at user-level with instructions like syscall on x86_64 or sc in ppc64. I see no references to any of this in the site. If your target is "high-level emulating" the underlying operating system (like WINE does), being able to specify native handlers for such instructions is a must.
If both conditions are satisfied then it should be doable for most applications: Load the binary, replace calls to dynamically linked system libraries with native implementations that match the specifications (on undocumented systems this means lots of reverse engineering) and ensure that any syscall/sc/etc. instruction results in a behaviour the application expects. From here you could take care of the rest yourself: Multiple threads could be handled by multiple host threads running each a Unicorn Engine instance with separate CPU register states, but sharing the entire virtual address space. The target application's stdin/stdout could be redirected from the host's emulator stdin/stdout. And so on... :-)
Disclaimer: I'm just guessing what could be done if they provide a decent binary translator and the API cares about high-level-emulation. But I'm not sure if that's even a goal for them by looking at their site.
I'm very interested to see how well it goes. If it is as flexible as it seems, perhaps it could be used to emulate the Mill? It'd be great to mess around with all these architectures!
With no code release yet it definitely seems early to be discussing on HN. The author of the excellent Capstone disassembly engine (http://www.capstone-engine.org/) is involved though, so it definitely has my interest.
If the emulator detects a memory write to one of the already recompiled blocks it can invalidate the results of the previous translation. If branch occurs to that block or to a new area in memory where new instructions were emitted, it can translate that part and cache the results. With a JIT binary translator this is no big deal. However, AOT translation of self-modifying code would not work.