Show HN: Carton – Run any ML model from any programming language

brap · on Sept 28, 2023

Just some random brain dump: Why limit to ML models?

Perhaps we can (should?) have some universal package hub, where you can package and push a "thing" from any language, and then pull and use it from any other language. With some metadata describing the input/output schema. The underlying engine can use WASM or containers or something like that.

mikeravkine · on Sept 28, 2023

..isn't this just Docker?

brap · on Sept 28, 2023

Well... yeah, kind of.

I guess some parts that are missing are having a schema for the CMD part, and being able to easily generate APIs for various languages from that schema

armchairhacker · on Sept 28, 2023

Dynamic libraries, command-line executables, …

jcrash · on Sept 28, 2023

So this means if I want to use a ML model I made in python, but don't want to code the rest of the application in python I can do that?

vpanyam · on Sept 28, 2023

Yes, that's a use case Carton supports.

For exmaple, if your model contains arbitrary Python code, you'd pack it using [1] and then you could load it from another language using [2]. In this case, Carton transparently spins up an isolated Python interpreter under the hood to run your model (even if the rest of your application is in another language).

You can take it one step further if you're using certain DL frameworks. For example, you can create a TorchScript model in Python [3] and then use it from any programming language Carton supports without requiring python at runtime (i.e. your model runs completely in native code).

[1] https://carton.run/docs/packing/python

[2] https://carton.run/docs/loading

[3] https://carton.run/docs/packing/torchscript

ZeroCool2u · on Sept 28, 2023

Seems almost too good to be true, but I really hope it's not. How does it handle things like CUDA dependencies? Can it somehow make those portable too? Or is GPU acceleration not quite there yet?

vpanyam · on Sept 28, 2023

Thanks :)

It uses the NVIDIA drivers on your system, but it should be possible to make the rest of CUDA somewhat portable. I have a few thoughts on how to do this, but haven't gotten around to it yet.

The current GPU enabled torch runners use a version of libtorch that's statically linked against the CUDA runtime libraries. So in theory, they just depend on your GPU drivers and not your CUDA installation. I haven't yet tested on a machine that has just the GPU drivers installed (i.e without CUDA), but if it doesn't already work, it should be very possible to make it work.

jcrash · on Sept 28, 2023

That’s awesome! Thanks for making this

jarym · on Sept 28, 2023

This looks interesting - I use OONX to call my PyTorch models from .NET but so far it’s meant I’ve not been able to test out JAX based libraries since they don’t have ONNX export and it has also meant I had to write C# boilerplate code to preprocess my input data into the form required by the model.

Potentially this could be a lot better but I’d be curious what speed overhead the IPC layer adds. At least with ONNX you get a nice speed bump :)

civilitty · on Sept 28, 2023

Any plans to support Windows? That would make Carton the ultimate library to embed LLMs into desktop applications

vpanyam · on Sept 28, 2023

I'm definitely open to it if there's interest (or if someone wants to help), but I don't have plans to implement Windows support myself at the moment.

The currently supported platforms [1] were mostly driven by environments I've seen at various tech companies.

I do have active plans to support inference from WASM/WebGPU so maybe that could be a good entrypoint to Windows support.

--

[1] Currently, the supported platforms are:

* `x86_64` Linux and macOS

* `aarch64` Linux (e.g. Linux on AWS Graviton)

* `aarch64` macOS (e.g. M1 and M2 Apple Silicon chips)

* WebAssembly (metadata access only for now, but WebGPU runners are coming soon)

Nischalj10 · on Sept 28, 2023

is this ancillary to what [these guys](https://github.com/unifyai/ivy) are trying to do?

carbocation · on Sept 28, 2023

That seems different to me. OP is talking about using ML models outside of python (well, in python, too). That link seems to be talking about using ML models across frameworks (pytorch, tensorflow, jax, etc) in python.

Nischalj10 · on Sept 28, 2023

got it. went through both of the codebases. what you say is the case. thanks!

gorenb · on Sept 29, 2023

This HN post looks really weird on mobile (no, not the website, HN itself)

astronautas · on Sept 28, 2023

Is this the same as Nvidia's Triton?

capableweb · on Sept 28, 2023

I think this Carton project is on a lower level than Triton. With Triton you'd start the Triton server then make requests against it, while Carton is more like a library that you include in your application/library and code it with the same language you'd write your application/library.

astronautas · on Oct 2, 2023

True!

gemaif1li · on Sept 28, 2023

When will you release a Java client?

carbocation · on Sept 28, 2023

I'd love to see this for golang (even without GPU support).

Areibman · on Sept 28, 2023

Maybe I'm missing something here, isn't this largely achieved by ONNX already?

[0] https://onnx.ai

vpanyam · on Sept 28, 2023

That's a good question! There's an FAQ entry on the homepage that touches on this, but let me know if I can improve it:

> ONNX converts models while Carton wraps them. Carton uses the underlying framework (e.g. PyTorch) to actually execute a model under the hood. This is important because it makes it easy to use custom ops, TensorRT, etc without changes. For some sophisticated models, "conversion" steps (e.g. to ONNX) can be problematic and require additional validation. By removing these conversion steps, Carton enables faster experimentation, deployment, and iteration.

> With that said, we plan to support ONNX models within Carton. This lets you use ONNX if you choose and it enables some interesting use cases (like running models in-browser with WASM).

More broadly, Carton can compose with other interesting technologies in ways ONNX isn't able to because ONNX is an inference engine while Carton is an abstraction layer.

WorldMaker · on Sept 28, 2023

> This lets you use ONNX if you choose and it enables some interesting use cases (like running models in-browser with WASM)

If someone already has an ONNX model, there's already an in-browser capable ONNX runtime: https://onnxruntime.ai/docs/get-started/with-javascript.html...

(It does use some parts compiled to WASM under the hood, presumably for performance.)

Dayshine · on Sept 28, 2023

ONNX runtime doesn't convert models, it runs them, and it has bindings in several languages. And most importantly it's tiny compared to the whole python package mess you get with TF or pytorch.

If carton took a TF/pytorch model and just dealt with the conversion into a real runtime, somehow using custom ops for the bits that don't convert, that would be amazing though.

ZeroCool2u · on Sept 28, 2023

There's an ONNX runtime, but to use the runtime you do need to convert your model into ONNX format first. You can't just run a TF of PyTorch model using the ONNX runtime directly. (At least last time I checked.) Unfortunately this conversion process can be a pain and there needs to be an equivalent operator in ONNX for each op in your TF/Torch execution graph.

otabdeveloper4 · on Sept 28, 2023

> From any [*] programming language.

[*] If "any programming language" is Python or Javascript.

liuliu · on Sept 28, 2023

This is a reasonable approach for systems that allowed to load binaries (either the running artifact is a binary or semi-binary (WASM executable) or it allows to load .so / .dll from user-provided places).

It basically runs with the promise that you can package CUDA / PyTorch / Python interpreter into the host language in some way, and use it.

This is true for Android, not true for iOS, true for almost all desktop systems, somewhat true for web (packaging PyTorch + Python interpreter in WASM, the latter is easy, the former, I am unsure), probably not true for FAAS environments (such as Cloudflare worker, or AWS Lambda).

otabdeveloper4 · on Sept 29, 2023

It's gonna fall apart in a spectacular way when they try to marshal data across compiled language boundaries.

This is the actual hard problem in this domain, not packaging a model file in a zipfile.

astronautas · on Sept 28, 2023

Make it for Go, and I am sold. Running ML models in Go services is still an unsolved problem.

r0l1 · on Sept 28, 2023

We have a similar high performance AI stack written in Go capable to load many different models from different frameworks. This is work of several years. Just saw your comment and thought about our company internal talk to release everything under an open source license. Thanks for reminding me :) What are your use-cases?

astronautas · on Sept 28, 2023

Wow, make it open source quickly!!! :hype:. It's a classic Python REST API for model serving. But we have very low latency constraints. As such, rewriting in more high performant backend languages e.g. Go or Rust would substantially reduce resource usage (by reducing horizontal scaling need). Pre-baked model serving frameworks e.g. Nvidia's Triton aren't an option, since we have to query a feature store, and do some input feature tracking in between. Go seemed like an efficient, developer friendly choice, but there aren't any well maintained model inference libraries in Go up to this day...

huac · on Sept 28, 2023

We used Triton Inference Server (with a Golang sidecar to translate requests) for model serving and a separate Go app that handled receiving the request, fetching features, sending to Triton, doing other stuff with the response, serving. This scaled to 100k QPS with pretty good performance but does require some hops.

In general writing pure Go inference libraries sucks. Not easy to do array/vector manipulation, not easy to do SIMD/CUDA acceleration, cgo is not go, etc. I wrote a fast XGBoost library at least (https://github.com/stillmatic/arboreal) - it's on par with C implementations, but doing anything more complex is going to be tricky.

astronautas · on Oct 2, 2023

Cool, thanks for sharing!

ramoz · on Sept 28, 2023

I’ve also ran models in Go, transformers even T5. There wasn’t that much overhead maybe some annoying compilation stuff but nothing crazy.

This was tensorflow btw which has Go bindings support.

It is a smart & worthwhile move, we also needed to drop python for performance/cost gains.

astronautas · on Sept 28, 2023

eh, awesome! Seems this one, right? https://github.com/galeone/tfgo. Quite many stars.

ramoz · on Sept 28, 2023

I think just native https://pkg.go.dev/github.com/tensorflow/tensorflow/tensorfl... but tfgo looks interesting.

Actually the docs around this weren’t great. Took the train-in-python & inference-in-go approach. And only for versions greater than tf2

astronautas · on Sept 28, 2023

Write a blog post then about this! I can tell you it is hardly a solved problem.

liuliu · on Sept 28, 2023

This seems to be a reasonable approach for Go, but you did need to carry a lot in your containerized environment (Go tends to have very lean container, and this approach requires a fat container with CUDA, PyTorch, Python etc).

softg · on Sept 28, 2023

Slightly related dumb question, I saw on GitHub that TensorFlow has Java support. Does anyone actually use TensorFlow with Java?

genewitch · on Sept 28, 2023

Aruba networks does

conradev · on Sept 28, 2023

> Carton wraps your model with some metadata and puts it in a zip file

Why a zip file?

vpanyam · on Sept 28, 2023

In addition to the benefits mentioned in the sibling comment, zip files let you seek to and access individual files in the archive without extracting all files (vs tar files for example).

This lets us do things like fetch model metadata [1] for a large remote model, by only fetching a few tiny byte ranges instead of the whole model archive.

It also means you can include sample data (images, etc) with your model and they're only fetched when necessary (for example with stable diffusion: https://carton.pub/stabilityai/sdxl)

[1] https://carton.run/docs/metadata

shoo · on Sept 28, 2023

zip-file-as-a-container-format seems pragmatic: it's a way to bundle multiple files into one file (easier to manage than scattering multiple files), it avoids introducing a new proprietary format, it can optionally be compressed, support for reading and writing the container format is already widespread.

To give two examples of prior art, it worked for Quake 3 data files (.pk3) & geospatial data files (.kmz)

Maybe it's not the best choice but it doesn't seem like a bad one.

janalsncm · on Sept 28, 2023

Also docx as well I believe.

capableweb · on Sept 28, 2023

It's a fairly common way of bundling multiple files into one that has large support and usually "good enough" compression.

It's hardly revolutionary to do this, here are some common examples of things that are zip files but don't label themselves as such:

- .jar

- .odt, .ods, .odp, .docx, .xlsx, .pptx

- .epub

- .apk

- .crx, .xpi

otteromkram · on Sept 28, 2023

"...run any machine learning model from any programming language*."

*As long as that language is python or rust.

What I think is that this is nothing more than a resume-bolstering effort that doesn't really need to exist and probably won't once OP lands a role at whatever FAANG company they're trying to impress.

smokel · on Sept 28, 2023

Replying to this to explain the downvotes.

We all think this. My initial thought was that this is probably a startup selling PyTorch-as-a-Service, and I did not bother to read the article. It turns out that I was wrong, and this might even be useful -- if not for the implementation, then perhaps for the idea.

However, it turns out to make Hacker News a nicer space if we follow these guidelines:

> Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something.

otabdeveloper4 · on Sept 28, 2023

It's not a shallow dismissal.

The selling point of this thing is cross-language interoperability, and while they advertise it, they don't deliver.

Sorry, but if your "any language" is "Python or Javascript" your project hasn't even reached the proof of concept stage, it's just a vague idea at this point.

Supporting C++ and C will be 90% of the work and the real challenge.

smokel · on Sept 28, 2023

The shallow dismissal that I was referring to is:

> What I think is that this is nothing more than a resume-bolstering effort that doesn't really need to exist and probably won't once OP lands a role at whatever FAANG company they're trying to impress.

The title of the post might be click-bait, but there is an obvious asterisk on the homepage of Carton, and even at a quick glance it is obvious that only very few languages are supported. The claim is so obviously false, that I don't mind. I would not expect support for INTERCAL or Awk.

Yes, it does not deliver, but that does not warrant the personal attack. The author of Carton actually already had internships at Google and Facebook, and currently works at Uber.

capableweb · on Sept 28, 2023

Maybe you and I have different understandings about what "Proof of Concept" means, but if you're supposed to deliver cross-language interoperability and have successfully delivered it to three different languages with wildly different runtimes, I think I'd consider that a successful proof of concept and since you're demonstrated that the bindings works for at least two other languages, it's more or less trivial to get it to work for N other languages, so this is clearly beyond the proof of concept stage at this point, and trying to reach a maturity stage instead.

3vidence · on Sept 28, 2023

I gotta agree here, I don't think the process of porting this to a wide array of languages is trivial.

Additionally I would have some serious performance concerns when it comes to marshaling the data across languages boundaries.

chriscosma · on Sept 28, 2023

OP has already worked at both Facebook and Google, it's doubtful they need any more resume-bolstering.