It is undocumented but you can get a fairly decent idea of what is going on if you have a good understanding of such architectures in general and from the sparse documentation they provide, if you run microbenchmarks and use tools such as decuda (https://github.com/laanwj/decuda/wiki).
Also people working with those devices are often scientists that are eager to share what they found out (if only to say "You're doing it wrong!"). See for example Vasily Volkov's work here http://www.cs.berkeley.edu/~volkov/
Also people working with those devices are often scientists that are eager to share what they found out (if only to say "You're doing it wrong!"). See for example Vasily Volkov's work here http://www.cs.berkeley.edu/~volkov/