We're experimenting with `nbdev`[0], especially in our effort to support fast.ai's[1] latest course 'fastbook'[2] on iko.ai[3], to test their notebooks faster. Though scheduling notebooks on our platform is a breeze[4][5] and we could launch the 20 notebooks really fast even manually and check their output while running (some fail that way as they required user interaction for FileUpload or something, and we decided to use fixtures).
Your platform looks interesting. Definitely going to check it out. I've recently been working on a project for my employer to turn jupyter notebooks into deploy anywhere, run anywhere functions.
Thanks. Not interesting enough for you to immediately sign up, so let's figure out where we screwed up...
>I've recently been working on a project for my employer to turn jupyter notebooks into deploy anywhere, run anywhere functions.
What's the "job to be done"? What will this solve for you, and what's your current workflow? If it's similar to our needs, you're welcome to become our cherished customers. We'll socially distantantly throw flowers and rice and hand sanitizer.
For us, for many years, there were always "chasms" between domains and universes in our team. Data scientists lived in notebooks, however they suffered to set up their environments. Libraries' versions, dependencies, NVIDIA drivers, etc. They'd either lose time setting up their laptops and workstations, and then were afraid to change anything. It also lead to the "it works on my machine" when they wanted to try a notebook from another team member, who also happened to have a different environment.
We then had a beast of a workstation for heavier compute jobs. It was a tragi-comic to coordinate. People had to be on-premises. They also had to assign different ports to run their Jupyter notebooks (before JupyterHub, etc). They had to coordinate: I'm running a job, can you not launch your training until I'm done?
Notebooks flying around, then committed to repository management but some data scientists were not familiar with git.
Ad-hoc experiment tracking when people remembered. Either in physical notebooks, logs, spreadsheets, random text files and notes, "memory", or not at all because they forget.
Deployment was manual. Build a quick Flask application, get the weights, put them somewhere, spin up a VM on GCP, scp the weights, etc, etc. Data scientists had to tap someone on the shoulder to deploy their model.
In other words, pretty much every mistake and bad habit.
There was then a change in the company and we started this to be able to execute our projects in a more systematic way. The way I used to describe it is: "I want models to become espresso cartriges plugged into the machine".
Around 2019, we started to push for more remote and more asynchronous. It bothered us that people suffered through commute (sometimes up to six hours daily). It was nonsense. We built whatever allowed that. We had members who had a bad internet connection, and we worked on cache and compressing static files, reconnecting Jupyter servers seamlessly, scheduling long-running notebooks and visualizing their output notwithstanding front-kernel connections, etc.
We wanted to be able to manage data, collaborate on notebooks, run training, track experiments, deploy and then monitor models. And we went at it chunk by chunk. We took a look at several other products and platforms but they either were too restrictive or handled one piece of the cycle (even though claiming complete project lifecycle management) and one of the problems is precisely that "fragmentation", or they were internal platforms at companies that heavily relied on ML (FB Learner Flow, Uber Michelangelo, Airbnb Bighead) but we couldn't access these.
There were also products that solved what was in our opinion the wrong problems. The reason I think is that those who start them come from a web dev background into ML and think the ecosystem is what it is because CI/CD and Devops were lost to the "ML people" and if only they could see the light. That, or their hypothesis is that better stylesheets will solve the cluster-mess when in fact it'll just make a beautiful mess.
- [0]: https://github.com/fastai/nbdev
- [1]: https://www.fast.ai/
- [2]: https://github.com/fastai/fastbook
- [3]: https://iko.ai
- [4]: https://iko.ai/docs/notebook/#long-running-notebooks
- [5]: https://pbs.twimg.com/tweet_video/Entg8COXcAIDdTI.mp4