however i do have to ask.. ~2x faster for fp16->fp8 is expected right? its still not as good as the "realtime" or "lightning" options that basically have to be 5-10x faster. whats the ideal product usecase for just ~2x faster?
however i do have to ask.. ~2x faster for fp16->fp8 is expected right? its still not as good as the "realtime" or "lightning" options that basically have to be 5-10x faster. whats the ideal product usecase for just ~2x faster?