There's a good chance the LLVM backend will emit PTX, not machine code. PTX is well documented [1]. Under such a system, the generated PTX would be JITed at runtime by the driver.
Note that LLVM already has a (very experimental and not complete) PTX backend [2].
I'm pretty sure this is the case by playing with the OpenCL side of CUDA. If the '--version' flag is passed to the OpenCL compiler (at least the one with CUDA 3.0), info from an LLVM build from a year ago is dumped. The '-cl-nv-verbose' flag is also documented to pass '--verbose' to the ptxas assembler.
It is undocumented but you can get a fairly decent idea of what is going on if you have a good understanding of such architectures in general and from the sparse documentation they provide, if you run microbenchmarks and use tools such as decuda (https://github.com/laanwj/decuda/wiki).
Also people working with those devices are often scientists that are eager to share what they found out (if only to say "You're doing it wrong!"). See for example Vasily Volkov's work here http://www.cs.berkeley.edu/~volkov/
It's slightly better documented these days, ever since cuobjdump is bundled with the compiler tools. It allows SASS output, which is supposed to be the native machine code of the Fermis.