Do you think requiring to use their packages is too limiting for widespread usage? Seems like you're forced to use Rust or C atm.
This seems like a limitation that sits in a somewhat unusable place: For something simple and platform-specific (e.g. a HTTP transform) we can just use JS where the boot time perf makes up for the execution perf, and for something more serious like a full-fledged API 120ms should be more than enough time (and we can preemtively scale as long as we're not already at 0)
The way to think about Hyperlight is as a security substrate intended to host application runtimes. You’re right that the Hyperlight API only supports C and Rust today — but you can use that to for example load Python or JS runtimes which can then execute those languages natively.
But we might be able to do even better than that by leveraging Wasm Components [1] and WASI 0.2 [2]. Using a VM guest based on Wasmtime, suddenly it becomes possible to run functions written in any language that can compile to Wasm Components — all using standard tooling and interfaces.
I believe the team has a prototype VM guest based on Wasmtime working, but they still needed a little more time before it’s ready to be published. Stay tuned for future announcements?
This is probably the most streamlined/all-inclusive solution out of all that I've seen, but this has definitely been an extremely saturated space in 2024
What are the other viable players? As mentioned in another thread to this post duckdb is not “production-ready.” That has been a non-starter for us at work.
That's why our current approach is to build missing or not fully functional features ourselves to move fast. For example, DuckDB performs reads from Iceberg tables not according to the spec, can't perform writes, etc.
Rust definitely is harder to pick up. I’ve been working on my rust, and on paper I’d like to use rust for everything, but I can move 10x faster in go for far less than 10x the runtime overhead
For TPU pods they use 3D torus topology with multi-terabit cross connects. For GPU, A3 Ultra instances offer "non-blocking 3.2 Tbps per server of GPU-to-GPU traffic over RoCE".
Is that the worst for training? Namely: do superior solutions exist?
This seems like a limitation that sits in a somewhat unusable place: For something simple and platform-specific (e.g. a HTTP transform) we can just use JS where the boot time perf makes up for the execution perf, and for something more serious like a full-fledged API 120ms should be more than enough time (and we can preemtively scale as long as we're not already at 0)
reply