I'm not a python expert, but this feels very odd to me (both the *init* construction and the return [tgemm.mm](http://tgemm.mm/)(input, self.weight, self.bias, None, None) call, which looks like markdown to me:
also why is it calling .cuda() to move tensors to a cuda driver? I suppose this is because this is based on HIP - which comes with it's own set of problems, but that's ROCm for the masses I guess.
Also the tgemm.mm has to be a torch module (at first I thought this was some lowlevel library which they now have a preview of, because there is a ROCm-torch already
...) which is evident from the table just before the summary. That table also smells like they are mostly focused on inference...
EDIT: seems official ROCm-torch is also based on HIP.