Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Carton – Run any ML model from any programming language (carton.run)
196 points by vpanyam on Sept 28, 2023 | hide | past | favorite | 53 comments
The goal of Carton is to let you use a single interface to run any machine learning model from any programming language.

It’s currently difficult to integrate models that use different technologies (e.g. TensorRT, Ludwig, TorchScript, JAX, GGML, etc) into your application, especially if you’re not using Python. Even if you learn the details of integrating each of these frameworks, running multiple frameworks in one process can cause hard-to-debug crashes.

Ideally, the ML framework a model was developed in should just be an implementation detail. Carton lets you decouple your application from specific ML frameworks so you can focus on the problem you actually want to solve.

At a high level, the way Carton works is by running models in their own processes and using an IPC system to communicate back and forth with low overhead. Carton is primarily implemented in Rust, with bindings to other languages. There are lots more details linked in the architecture doc below.

Importantly, Carton uses your model’s original underlying framework (e.g. PyTorch) under the hood to actually execute the model. This is meaningful because it makes Carton composable with other technologies. For example, it’s easy to use custom ops, TensorRT, etc without changes. This lets you keep up with cutting-edge advances, but decouples them from your application.

I’ve been working on Carton for almost a year now and I’m excited to open source it today!

Some useful links:

* Website, docs, quickstart - https://carton.run

* Explore existing models - https://carton.pub

* Repo - https://github.com/VivekPanyam/carton

* Architecture - https://github.com/VivekPanyam/carton/blob/main/ARCHITECTURE...

Please let me know what you think!




Just some random brain dump: Why limit to ML models?

Perhaps we can (should?) have some universal package hub, where you can package and push a "thing" from any language, and then pull and use it from any other language. With some metadata describing the input/output schema. The underlying engine can use WASM or containers or something like that.


..isn't this just Docker?


Well... yeah, kind of.

I guess some parts that are missing are having a schema for the CMD part, and being able to easily generate APIs for various languages from that schema


Dynamic libraries, command-line executables, …


So this means if I want to use a ML model I made in python, but don't want to code the rest of the application in python I can do that?


Yes, that's a use case Carton supports.

For exmaple, if your model contains arbitrary Python code, you'd pack it using [1] and then you could load it from another language using [2]. In this case, Carton transparently spins up an isolated Python interpreter under the hood to run your model (even if the rest of your application is in another language).

You can take it one step further if you're using certain DL frameworks. For example, you can create a TorchScript model in Python [3] and then use it from any programming language Carton supports without requiring python at runtime (i.e. your model runs completely in native code).

[1] https://carton.run/docs/packing/python

[2] https://carton.run/docs/loading

[3] https://carton.run/docs/packing/torchscript


Seems almost too good to be true, but I really hope it's not. How does it handle things like CUDA dependencies? Can it somehow make those portable too? Or is GPU acceleration not quite there yet?


Thanks :)

It uses the NVIDIA drivers on your system, but it should be possible to make the rest of CUDA somewhat portable. I have a few thoughts on how to do this, but haven't gotten around to it yet.

The current GPU enabled torch runners use a version of libtorch that's statically linked against the CUDA runtime libraries. So in theory, they just depend on your GPU drivers and not your CUDA installation. I haven't yet tested on a machine that has just the GPU drivers installed (i.e without CUDA), but if it doesn't already work, it should be very possible to make it work.


That’s awesome! Thanks for making this


This looks interesting - I use OONX to call my PyTorch models from .NET but so far it’s meant I’ve not been able to test out JAX based libraries since they don’t have ONNX export and it has also meant I had to write C# boilerplate code to preprocess my input data into the form required by the model.

Potentially this could be a lot better but I’d be curious what speed overhead the IPC layer adds. At least with ONNX you get a nice speed bump :)


Any plans to support Windows? That would make Carton the ultimate library to embed LLMs into desktop applications


I'm definitely open to it if there's interest (or if someone wants to help), but I don't have plans to implement Windows support myself at the moment.

The currently supported platforms [1] were mostly driven by environments I've seen at various tech companies.

I do have active plans to support inference from WASM/WebGPU so maybe that could be a good entrypoint to Windows support.

--

[1] Currently, the supported platforms are:

* `x86_64` Linux and macOS

* `aarch64` Linux (e.g. Linux on AWS Graviton)

* `aarch64` macOS (e.g. M1 and M2 Apple Silicon chips)

* WebAssembly (metadata access only for now, but WebGPU runners are coming soon)


is this ancillary to what [these guys](https://github.com/unifyai/ivy) are trying to do?


That seems different to me. OP is talking about using ML models outside of python (well, in python, too). That link seems to be talking about using ML models across frameworks (pytorch, tensorflow, jax, etc) in python.


got it. went through both of the codebases. what you say is the case. thanks!


This HN post looks really weird on mobile (no, not the website, HN itself)


Is this the same as Nvidia's Triton?


I think this Carton project is on a lower level than Triton. With Triton you'd start the Triton server then make requests against it, while Carton is more like a library that you include in your application/library and code it with the same language you'd write your application/library.


True!


When will you release a Java client?


I'd love to see this for golang (even without GPU support).


Maybe I'm missing something here, isn't this largely achieved by ONNX already?

[0] https://onnx.ai


That's a good question! There's an FAQ entry on the homepage that touches on this, but let me know if I can improve it:

> ONNX converts models while Carton wraps them. Carton uses the underlying framework (e.g. PyTorch) to actually execute a model under the hood. This is important because it makes it easy to use custom ops, TensorRT, etc without changes. For some sophisticated models, "conversion" steps (e.g. to ONNX) can be problematic and require additional validation. By removing these conversion steps, Carton enables faster experimentation, deployment, and iteration.

> With that said, we plan to support ONNX models within Carton. This lets you use ONNX if you choose and it enables some interesting use cases (like running models in-browser with WASM).

More broadly, Carton can compose with other interesting technologies in ways ONNX isn't able to because ONNX is an inference engine while Carton is an abstraction layer.


> This lets you use ONNX if you choose and it enables some interesting use cases (like running models in-browser with WASM)

If someone already has an ONNX model, there's already an in-browser capable ONNX runtime: https://onnxruntime.ai/docs/get-started/with-javascript.html...

(It does use some parts compiled to WASM under the hood, presumably for performance.)


ONNX runtime doesn't convert models, it runs them, and it has bindings in several languages. And most importantly it's tiny compared to the whole python package mess you get with TF or pytorch.

If carton took a TF/pytorch model and just dealt with the conversion into a real runtime, somehow using custom ops for the bits that don't convert, that would be amazing though.


There's an ONNX runtime, but to use the runtime you do need to convert your model into ONNX format first. You can't just run a TF of PyTorch model using the ONNX runtime directly. (At least last time I checked.) Unfortunately this conversion process can be a pain and there needs to be an equivalent operator in ONNX for each op in your TF/Torch execution graph.


> From any [*] programming language.

[*] If "any programming language" is Python or Javascript.


This is a reasonable approach for systems that allowed to load binaries (either the running artifact is a binary or semi-binary (WASM executable) or it allows to load .so / .dll from user-provided places).

It basically runs with the promise that you can package CUDA / PyTorch / Python interpreter into the host language in some way, and use it.

This is true for Android, not true for iOS, true for almost all desktop systems, somewhat true for web (packaging PyTorch + Python interpreter in WASM, the latter is easy, the former, I am unsure), probably not true for FAAS environments (such as Cloudflare worker, or AWS Lambda).


It's gonna fall apart in a spectacular way when they try to marshal data across compiled language boundaries.

This is the actual hard problem in this domain, not packaging a model file in a zipfile.


Make it for Go, and I am sold. Running ML models in Go services is still an unsolved problem.


We have a similar high performance AI stack written in Go capable to load many different models from different frameworks. This is work of several years. Just saw your comment and thought about our company internal talk to release everything under an open source license. Thanks for reminding me :) What are your use-cases?


Wow, make it open source quickly!!! :hype:. It's a classic Python REST API for model serving. But we have very low latency constraints. As such, rewriting in more high performant backend languages e.g. Go or Rust would substantially reduce resource usage (by reducing horizontal scaling need). Pre-baked model serving frameworks e.g. Nvidia's Triton aren't an option, since we have to query a feature store, and do some input feature tracking in between. Go seemed like an efficient, developer friendly choice, but there aren't any well maintained model inference libraries in Go up to this day...


We used Triton Inference Server (with a Golang sidecar to translate requests) for model serving and a separate Go app that handled receiving the request, fetching features, sending to Triton, doing other stuff with the response, serving. This scaled to 100k QPS with pretty good performance but does require some hops.

In general writing pure Go inference libraries sucks. Not easy to do array/vector manipulation, not easy to do SIMD/CUDA acceleration, cgo is not go, etc. I wrote a fast XGBoost library at least (https://github.com/stillmatic/arboreal) - it's on par with C implementations, but doing anything more complex is going to be tricky.


Cool, thanks for sharing!


I’ve also ran models in Go, transformers even T5. There wasn’t that much overhead maybe some annoying compilation stuff but nothing crazy.

This was tensorflow btw which has Go bindings support.

It is a smart & worthwhile move, we also needed to drop python for performance/cost gains.


eh, awesome! Seems this one, right? https://github.com/galeone/tfgo. Quite many stars.


I think just native https://pkg.go.dev/github.com/tensorflow/tensorflow/tensorfl... but tfgo looks interesting.

Actually the docs around this weren’t great. Took the train-in-python & inference-in-go approach. And only for versions greater than tf2


Write a blog post then about this! I can tell you it is hardly a solved problem.


This seems to be a reasonable approach for Go, but you did need to carry a lot in your containerized environment (Go tends to have very lean container, and this approach requires a fat container with CUDA, PyTorch, Python etc).


Slightly related dumb question, I saw on GitHub that TensorFlow has Java support. Does anyone actually use TensorFlow with Java?


Aruba networks does


> Carton wraps your model with some metadata and puts it in a zip file

Why a zip file?


In addition to the benefits mentioned in the sibling comment, zip files let you seek to and access individual files in the archive without extracting all files (vs tar files for example).

This lets us do things like fetch model metadata [1] for a large remote model, by only fetching a few tiny byte ranges instead of the whole model archive.

It also means you can include sample data (images, etc) with your model and they're only fetched when necessary (for example with stable diffusion: https://carton.pub/stabilityai/sdxl)

[1] https://carton.run/docs/metadata


zip-file-as-a-container-format seems pragmatic: it's a way to bundle multiple files into one file (easier to manage than scattering multiple files), it avoids introducing a new proprietary format, it can optionally be compressed, support for reading and writing the container format is already widespread.

To give two examples of prior art, it worked for Quake 3 data files (.pk3) & geospatial data files (.kmz)

Maybe it's not the best choice but it doesn't seem like a bad one.


Also docx as well I believe.


It's a fairly common way of bundling multiple files into one that has large support and usually "good enough" compression.

It's hardly revolutionary to do this, here are some common examples of things that are zip files but don't label themselves as such:

- .jar

- .odt, .ods, .odp, .docx, .xlsx, .pptx

- .epub

- .apk

- .crx, .xpi


"...run any machine learning model from any programming language*."

*As long as that language is python or rust.

What I think is that this is nothing more than a resume-bolstering effort that doesn't really need to exist and probably won't once OP lands a role at whatever FAANG company they're trying to impress.


Replying to this to explain the downvotes.

We all think this. My initial thought was that this is probably a startup selling PyTorch-as-a-Service, and I did not bother to read the article. It turns out that I was wrong, and this might even be useful -- if not for the implementation, then perhaps for the idea.

However, it turns out to make Hacker News a nicer space if we follow these guidelines:

> Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something.


It's not a shallow dismissal.

The selling point of this thing is cross-language interoperability, and while they advertise it, they don't deliver.

Sorry, but if your "any language" is "Python or Javascript" your project hasn't even reached the proof of concept stage, it's just a vague idea at this point.

Supporting C++ and C will be 90% of the work and the real challenge.


The shallow dismissal that I was referring to is:

> What I think is that this is nothing more than a resume-bolstering effort that doesn't really need to exist and probably won't once OP lands a role at whatever FAANG company they're trying to impress.

The title of the post might be click-bait, but there is an obvious asterisk on the homepage of Carton, and even at a quick glance it is obvious that only very few languages are supported. The claim is so obviously false, that I don't mind. I would not expect support for INTERCAL or Awk.

Yes, it does not deliver, but that does not warrant the personal attack. The author of Carton actually already had internships at Google and Facebook, and currently works at Uber.


Maybe you and I have different understandings about what "Proof of Concept" means, but if you're supposed to deliver cross-language interoperability and have successfully delivered it to three different languages with wildly different runtimes, I think I'd consider that a successful proof of concept and since you're demonstrated that the bindings works for at least two other languages, it's more or less trivial to get it to work for N other languages, so this is clearly beyond the proof of concept stage at this point, and trying to reach a maturity stage instead.


I gotta agree here, I don't think the process of porting this to a wide array of languages is trivial.

Additionally I would have some serious performance concerns when it comes to marshaling the data across languages boundaries.


OP has already worked at both Facebook and Google, it's doubtful they need any more resume-bolstering.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: