OK that makes sense; so basically if I want to give this a shot then I would jus...

crowwork · on April 15, 2023

I think instead what would be needed is a wgpu native runtime support for TVM. Like the implementations in tvm vulkan, then it will be naturally link to any runtime that provides webgpu.h

Then yah the llm_chat.js would be high-level logic that targets the tvm runtime, and can be implemented in any language that tvm runtime support(that includes, js, java, c++ rust etc).

Support webgpu native is an interesting direction. Feel free to open a thread in tvm discuss forum and perhaps there would be fun things to collaborate in OSS

summarity · on April 15, 2023

How big is the "runtime" part? My use case would basically be: run this in a native app that links against webgpu (wgpu or dawn). Is there a reference implementation for this runtime that one could study?

crowwork · on April 15, 2023

tvm runtime is pretty decent(~700k-2M level depending on dependency included), you can checkout tvm community and bring up the question there, i think there might be some common interest. There are impl of runtime for vulkan, metal that can be used as reference.