Well, I work on this stuff, but I’m mostly sharing this to spread awareness that you don’t need a multimillion dollar rack of nvidia gpu machines to do inference with surprisingly powerful models these days. Not that long ago, you’d need a much more expensive multi-kilowatt workstation to run this sort of thing at a useful speed.