Pretty neat! We've been using Lambda for ML serving low-volume CV models (and my understanding is AWS' Sagemaker Serverless is a lambda wrapper) for a couple of years at Roboflow and it is really good for low-volume and bursty use-cases. The latency is surprisingly not bad. It gets really expensive relative to GPUs for high load (and especially predictable high-load like monitoring security cameras 24/7) though so we end up with our biggest enterprise customers running things in a Kubernetes cluster.
There are a few serverless GPU companies like Banana.dev and Modal; I really want to give them a shot. Anyone have experience using them in prod?
We've been building with Modal over the past few months (though no prod-scale tests yet) and were slightly disappointed by very large (10-20 second) cold start times. In the long term we're more interested in inference servers that use compiled/optimized models instead of running plain old PyTorch (which adds another few seconds to cold start on its own).
We are adding support for inference servers to Pipeless. We started by the ONNX Runtime, and OpenVINO, CoreML, CUDA and TensorRT execution providers. Some people mentioned me to integrate also with the Triton server, however I still need to deep into that and check its license. The good part is, there is no cold start right now, at the cost of having some resources allocated from the node start.
After ChatGPT was announced, I found many cool projects that simplifies how you integrate LLM capabilities into your services. However, I didn't find many of these in the vision ecosystem.
Getting started in AI + vision with just 3 commands is amazing! I will definitely try it for some personal projects with IP cameras.
This looks like a really cool project; would you be open to us PR'ing support for the 50k fine-tuned models on Roboflow Universe[1] via an `inference`[2] integration?
Definitely! It is something I was thinking to do, I just did not find the time yet. I think allowing people to automatically load models from the Roboflow universe would be awesome!
There are a few serverless GPU companies like Banana.dev and Modal; I really want to give them a shot. Anyone have experience using them in prod?