Launch HN: Slai (YC W22) – Build ML models quickly and deploy them as apps

lysecret · on March 3, 2022

Congrats on the launch. I'm quite impressed.

Here are my unordered thoughts.

So it seems a lot like an improved colab with a deployment stage. Which sounds good to me, it will be much more expensive than colab though.

I like the pitch of SWEs doing ML instead of pitching towards Data Scientists. As Data Scientist turned SWE I still miss the Jupyter like cell based execution. (you said it exists but I couldn't find it.)

In general I'm quite sceptical when it comes to online IDEs. However, for Text and Image based models it might be enough (since you don't need tooo much code).

There might be a valid niche between colab on the one side and building it yourself with AWS cli on the other.

I wonder though, in your target market it really isn't such a big deal to spin up a rest api, there are no lambdas with GPU though (but this should be a matter of time). Or use something like AWS batch for remote training. It will come down to: Is it more convenient to code in your IDE and you handle Lambda, Batch Docker and CD. Or do you code in your own IDE and you have to handle this stuff yourself.

Wish you all the best!

Mernit · on March 3, 2022

Hi, thanks for checking it out! You bring up some great points.

When it comes to SWEs doing ML, our goal to bring together a bunch of apps that get people most of the way to something they can bring into production.

I appreciate your skepticism of online IDEs, and it’s unlikely we’ll ever completely replace notebooks or whatever IDEs people prefer hacking on locally. Instead, we’d like to take a hybrid approach, in which our online IDE is sufficient for making minor tweaks to a model after its in production, but the brunt of development can still happen locally, in which changes would be pushed to Git and synced with our online IDE.

It’s true that SWEs can deploy their own APIs, but that feels like an unnecessary annoyance to take for granted. At a high level you’re just setting up an API, but really you’re also going to setup some versioning system and a Docker file and monitoring, and all of that adds up to a lot of cognitive overhead.

BTW - the Jupyter-like cell execution can be turned on by clicking the “Interactive Mode” button on the bottom right corner of the IDE.

lysecret · on March 3, 2022

Makes a lot of sense! It really is true that ML models kind of live in its own little world with their training loop and should interact with the rest through a rest api. And now and then new data gets added for training, as well as the api should change a bit, maybe we tweak the labels. You managed to encapsulate that part. I might port one of our text model to it to try it out :)

Mernit · on March 4, 2022

Would love to hear how it goes! Reach out if you have any questions - we've ported over quite a few text models.

lysecret · on March 4, 2022

One other quick question. Have you built the ide yourself or are you using a provider? It looks pretty cool.

Mernit · on March 4, 2022

The core code editor itself uses Monaco (the same thing under the hood of VSCode), but everything else is custom (i.e. file browser, language server, syntax highlighting, tabs, etc.)

omarhaneef · on March 3, 2022

Firstly, I can't believe you have enough instances to resist the HN hug of death, with so many people presumably running tests. So that is impressive.

Secondly, I ran the train -> test cycle and I didn't see any error metrics. Is the idea that if we were spinning up our own we would be outputting these ourselves? Or would we have trained up the model somewhere else and we would transfer it to SLAI to do a final test and then package it?

llom2600 · on March 3, 2022

We tried to keep our testing workflow as flexible as possible. There's a couple of use-cases that we wanted to allow:

- User is working with a pre-trained model that already went through extensive testing during training. In this case our test utilities are useful as e2e tests. Once you integrate the model into your handler, you can specify a bunch of test cases to be sure your API is going to behave as expected (like a unit test).

- User wants to train the model on our platform - they can add error metrics directly in their training script and prevent the model from being saved if any error metric exceeds a certain threshold. They can then additionally use the test.py script to run tests against the model + handler.

rish1_2 · on March 3, 2022

This is essentially HuggingFace models + aws cdk deployed over lambda. They are your biggest competition but likely there is room for more. I think the key difference here is the training part, which can be done by Sagemaker. If aws makes it user friendly, they will be a serious threat. Good luck!

Mernit · on March 3, 2022

We think developer experience is the factor that has been sorely lacking from the ML tooling space. I’m curious how your experience using Sagemaker has been?

gleenn · on March 3, 2022

I heard that SageMaker wasn't great and was not prioritized by Amazon anymore. Definitely could be wrong.

neural_thing · on March 4, 2022

Sagemaker is fine. Works with Huggingface not quite out of the box, but close.

5cotts · on March 3, 2022

This seems pretty cool! I deployed a model to a REST endpoint and am trying to test it out now using a Jupyter notebook running Python.

Two things that happened to me:

1) I wasn't able to install `slai` using Pip and PyPi. I ended up downloading the source tarball from https://pypi.org/project/slai/#files and installing locally.

2) I am following the example for how to "Integrate" my model using Python under the "Metrics" tab. However, the call to `model = slai.model("foobarbaz")` is failing. It looks like the regex check for `MODEL_ROUTE_URI` from line 21 in `model.py` doesn't like my custom email address :(. For example, the following model endpoint isn't valid according to the regex: "s@slai.io/foo-bar-baz/initial" (My custom email is very similar to `s@slai.io`). I'll post the regex below.

`MODEL_ROUTE_URI = r"([A-Za-z0-9]+[\._]?[A-Za-z0-9]+[@]\w+[.]\w{2,3})/([a-zA-Z0-9\-\_]+)/?([a-zA-Z0-9\-\_]*)"`

Just wanted to let you know! Looking forward to experimenting with this more.

llom2600 · on March 3, 2022

The second SDK issue w/ the regex is resolved. Just bumped latest version to 0.1.70. If you upgrade you should be good to go. Still haven't been able to reproduce your issue with pip.

5cotts · on March 4, 2022

Can confirm it works now! Thanks.

llom2600 · on March 3, 2022

Thanks for the heads up - yep, looks like a bug in our SDK. Should have a new version out shortly that handles it.

Weird that you weren't able to install the slai sdk via pip, we just released a new version of the SDK this morning, unsure if that's related. I'll take a look into that this afternoon.

Thanks for trying it out!

icyfox · on March 3, 2022

Congratulations on the launch guys. Product need seems clear to me & is a painpoint that I've felt most acutely in side projects that I've worked on outside of our company's devoted CI infrastructure.

Are you planning any git or IDE integration? Most of the magic here seems to happens in the backend with easier training, scheduling, and inference. Could this be enabled locally so devs iterate in an environment that's more comfortable to them?

llom2600 · on March 3, 2022

Hey - thanks!

Yeah we've been thinking about this quite a bit. We've explored a couple of options here - I think our first pass is going to be a way to synchronize an external git repository with a sandbox. Would love to hear your thoughts here on what kind of workflow might make the most sense.

I think long term we'll also add VSCode integration through an extension, but that might be a few months out.

kamikazeturtles · on March 3, 2022

Very interesting!

So when a user trains a model you guys startup a docker container with everything in it. You guys bind the container's ports to the host and add it to some key value store that a reverse proxy references. Is that correct?

Sorry, I'm just really curious. It's a really interesting project. Do you guys have anything open source?

llom2600 · on March 3, 2022

Much of the complexity in the sandbox is in ensuring that the development environment behaves as it it would in production - but also that it loads as fast as possible. So we have to do a bit of stuff behind the scenes involving dynamically linking libraries, provisioning Kubernetes resources, etc. But generally that's about right.

We've haven't open-sourced any of it yet, but there are definitely a few components of our system that we'll open source once we feel they're stable enough.

kamikazeturtles · on March 3, 2022

I'm sorry I don't have any experience with Kubernetes

What benefit would Kubernetes bring to this architecture? You can create and destroy docker container using the api.

What do you guys use Kubernetes for?

llom2600 · on March 3, 2022

Kubernetes gives you a ton of extra tools that allow us to manage the lifecycle of our sandboxes, deployed models, asynchronous training jobs, etc.

Internally, each pod is just running a docker image. You could probably throw something together with docker/the docker API - but in our case we needed a bit more control.

kamikazeturtles · on March 3, 2022

Hey Eli!

Thanks for the helpful responses!

I sent you an email regarding a possible internship opportunity. Are you guys open to interns?

chrisweekly · on March 3, 2022

Awesome! This or something like it is going to bring ML to the (SWE) masses. Congrats and hoid luck and thanks!

lysecret · on March 4, 2022

Heyo,

I looked a bit more at your service, since I am migrating one of our text classification models anyways right now. I decided against using it but maybe my reasoning could still be helpful. (and I see a lot of potential so I want to help)

What I am using instead is a combination of AWS Batch and Colab. My reasoning:

# Local Development

Yes it is true that ML code can be quite well separated from the rest. But then there is data. So the extract and load step. I know you have bindings to for example Postgres but I wouldn't trust you with my DB.

And always moving over files could be done (we keep a backup anyways) but it would be more of a hassle. Also even for the actual ML code it is nice to have it in an actual IDe with a good debugger etc. I prefer to code your ML code locally and then just package it and send it away to be trained.

Also, yes there is a cost to set up the infrastructure but I prefer to solve that with code generation/templates and libraries (to send your docker to Lambda for inference and Batch/Colab for training). It is a cost that is paid once and then never again.

# Price

Your gpu instance costs 1 dollar an hour which is about 3 times as much as a p2.xlarge spot instance (which I assume is the closest one). Colab of course is 10 bucks a month / free. This is ignoring AWS credits for now. (would also be good to know which exact GPU you provide)

thegginthesky · on March 3, 2022

Congrats on the launch!

Overall I like the idea and I agree with you, either the tools are too focused on Data Scientists or there are a lot of DevOps involved to get things started

I work on the field so I have some questions:

- Are there any plans to connect the project into a git repo?

- Is there any option for me to pass trained binaries to your product? For example I have a beast of a machine and I can easily train things locally, but I'd like to host the inference with you guys

- Do you intend to allow automated testing and linting?

Mernit · on March 3, 2022

Hi, thanks for the questions! We're planning on adding a way to synchronize a git repo to a sandbox, should be out by April.

Right now, you can upload a trained binary to a sandbox and return it from the train function, and then use it for inference. So it's a bit manual at this point, but we're planning on improving that workflow shortly.

We built linting and testing into the sandbox, but testing is currently triggered manually - we're planning on building both into our CI/CD system (scheduled training)

luke-stanley · on March 3, 2022

This is cool. It took me a while to figure out that you want people to click the test button on the sidebar to try it out, not the "Test model" buttons unit tests in the bottom right side. Unit tests might benefit from a different kind of icon. I tried the "Interactive Mode" toggle button too, and that didn't do anything obvious.

Mernit · on March 3, 2022

Thanks for checking us out! We sort of anticipated that — we just launched the unit test feature this week, and it’s not easily distinguished from the test panel. We’ll probably rename the test panel to something more descriptive, like API runner. Interactive mode is a feature to set interactive breakpoints in your code, it’s really useful for iterating on individual blocks of code without having to run the entire app E2E.

crsn · on March 3, 2022

Your website pitches this product SO well. Kudos.

Mernit · on March 3, 2022

Thanks, that means a lot! We've spent a lot of time thinking about the pitch, given how many other ML tools are out there. We're hoping to strike a chord with simplicity and developer experience.

sandGorgon · on March 3, 2022

this is pretty cool! especially the opinionated structuring part.

now Sagemaker allows u to download ur running code and docker (https://docs.aws.amazon.com/sagemaker/latest/dg/data-wrangle...) . Also allows u to simulate local running - https://github.com/aws/sagemaker-tensorflow-training-toolkit

rather than anything else, this is basically just a way to calm worries about lock-in. Google ML resisted this for a long time, but even they had to finally do it - https://cloud.google.com/automl-tables/docs/model-export

are you planning something similar ?

llom2600 · on March 3, 2022

We have been planning an "eject" feature that would let you develop in our app, but then export your artifact as a docker image you can spin up in your own cluster (or whatever). This is a necessity for customers who require on-premise deployments.

However, this is probably a few months out since we're currently focused on startups/developers that don't have that requirement.

1_over_n · on March 4, 2022

Sorry i haven't look at this properly yet - keen to know if i can upload a customer pre-trained model using any of the popular libraries (pytorch, keras etc) and just do the deployment as an app with Slai?

Mernit · on March 4, 2022

Yep! You can upload pre-trained models — just upload a pickled binary of your model into the “data” section of the sandbox and then load and return the object in the train function.

We chose to do it this way to ensure that the binary you upload is properly tracked in our versioning system, and that it can be integrated into your handler.

frozencell · on March 4, 2022

Superb! Can we implement a paper like this? https://github.com/nv-tlabs/editGAN_release

llom2600 · on March 4, 2022

Really interesting - could definitely support the model itself. Might require a bit of conversation to figure out how to port the tool they use to interact with the model over.

dayeye2006 · on March 4, 2022

How you guys compare with sagemaker? It also lets you bring in custom containers as training and (batch/real-time) inference phase.

Mernit · on March 4, 2022

On paper, Sagemaker does everything, but they don’t do many of those things well. I think Sagemaker is a great product for enterprises who want to maximize the products procured from a single vendor — it’s easy to buy Sagemaker when all of your infra is already on AWS.

It’s fairly painful to productionize a model on Sagemaker — they make you think about a lot of things and fit into AWS primitives. Besides the code for the model, we don’t force users to think about anything. Our focus is helping engineers get models into production, not reading documentation.

Using our tool, you can fork a model and deploy it to production right away — there’s no time spent battling AWS primitives. We’re focused on developer experience above everything else - which means we enforce sandboxes on our platform to be consistent and reproducible.

timmit · on March 3, 2022

I had the similar idea in 2018, to transfer AI model to API endpoints. but I did not do anything. :cry

ayanb · on March 3, 2022

Cool product! are you guys using wasm under the hood?

llom2600 · on March 3, 2022

Hi - Luke (CTO) here. Thanks! Yeah, we're using WASM for things some things in the editor like syntax highlighting. Planning on moving most of the network logic into WASM shortly as well.

ayanb · on March 4, 2022

great - I sent you a note over on LI. Hoping we can jam a bit on the network logic and tooling available to fast track things.

thecleaner · on March 4, 2022

How is your product different than SageMaker ? Why can't I replicate the same functionality with SageMaker endpoints ?

tullie · on March 4, 2022

Amazing. About time sometime built a good replacement for sagemaker! Congrats on the launch

gergely · on March 4, 2022

What is your plan, how am I going to be able to automatically feed the model with data?

llom2600 · on March 4, 2022

We have a data integrations feature that allows you to connect external data sources (e.g. S3) with your sandbox.

You can basically combine this with "Scheduled Training", and it'll retrain your model on a schedule - pulling in the new data. This is V1 and does not yet handle more complex, event based retraining. This is something we know is crucial for tons of use cases and we're planning on adding it in the coming months.

Happy to chat and get your feedback on what kind of event sources you might want to trigger the model to retrain on new data.

Oras · on March 3, 2022

Congratulations on launch. How’s slai different from huggingface?

Mernit · on March 3, 2022

Hi, thanks! The main difference is that HuggingFace contains a huge repository of pretrained models, whereas we're providing the scaffolding to build your own end to end applications. For example, in Slai you could actually embed a HuggingFace model (or maybe two models), combine them into one application, along with API serialization/deserialization, application logic, CI/CD, versioning, etc.

You can think of us as being a store of useful ML based microservices, and not just a library of pre-trained models.

Oras · on March 3, 2022

Sounds good, but you can actually train and deploy with HuggingFace.

https://huggingface.co/autonlp

Looking forward to seeing your success, good luck.

subrao1 · on March 4, 2022

We would like to talk to you. We are in San Jose, CA

dayeye2006 · on March 4, 2022

How you guys compare with sagemaker?

subrao1 · on March 4, 2022

We would like to talk to you

Mernit · on March 4, 2022

Hi! Feel free to shoot us an email, founders@slai.io