Lots of AI HW is focused on RAM (512GB!). I have a cost-sensitive application that needs speed (300+ TOPS), but only 1GB of RAM. Are there any HW companies focused on that space?
Like others have said, basically traditional GPUs (RTX 40/50 series in particular, 20/30 series have much weaker tensor cores).
In terms of software, recent NVIDIA and AMD research has focused on fast evaluation of small ~4 layer MLPs using FP8 weights for things like denoising, upscaling, radiance caching, and texture and material BRDF compression/decompression.
NVIDIA has just put out some new graphics API extensions and samples/demos for loading a chunk of neural net weights and performing inference from within a shader.
Just buy any gaming card? Even something like the Jetson AGX Orin boasts 275 TOPS (but they add in all kind of different subsystems to reach that number).
The problem with the TOPS is that they add in ~100 TOPS from the "Deep Learning Accelerator" coprocessors, but they have a lot of awkward limitations on what they can do (and software support is terrible). The GPU is an Ampere generation, but there is no strict consumer GPU equivalent.