I am a huge fan of Observable and I think it has massive potential beyond visualizations. To me it's the holy grail of a development environment, there is no separation between development and output. There is no toolchain separating source and binary, infact, the binary is partially recomputed on changes.
On top, it is forkable for others to build upon. It support comments, its reactive. When I explain it to people I say its like a combination excel + vscode + google docs + github. There is nothing like it!
I been trying to see how far I can push it for non- visualization software. Turns out... pretty far....
I am so productive in Observable because there is no context switch between development, using the product and debugging, it's all just one thing. Amazing, utterly amazing. I hope the Observable team can accommodate my use cases, as its clearly not what they had in mind.
I also found Observable and spent a week or so rebuilding https://nopaint.art in it! I really liked how you could export a Notebook and it would pull in all the dependencies to make a SPA. I think it would be such a beautiful way to develop my software with other people. (Edit: I only built a prototype that has not been released, but I still believe it would be viable over the current game engine I'm using, especially for collaboration.)
I really like Observable a lot, and have made a couple of brief, messy visualisations with it (messy in terms of code at least).
I hope at some point they can offer some sort of offline support. The focus on ease of sharing is great, and the volume and breadth of content has really helped me to get more proficient with d3. The inability to open and edit notebooks offline/locally is a real inconvenience, however.
I have a small personal site on Netlify (with Next.js) and going forward I'm looking at hosting the sort of stuff I have on Observable there, simply as it's easier to work to when offline. The quality of the community and resources is a really valuable thing, but the need to be online is a real drawback at times.
If you inspect the source in the browser, you will see that the notebooks are hosted on the same server.
P.S. If you want online interactive notebook on your own site, I guess that is their business model either as a viral sharing of free notebooks or paid private team sharing. (I was on a team plan during the development of the project. But once the project is done, there is no need for us to continue on team plan and be interactive. I am glad that they offer support of offline production notebook.)
P.P.S. If Observable Team is listening, here is a business idea for them: observable playground as a service. It would be less a technical challenge than a business development challenge.
I'm a big fan of Observable, having used it to prototype and learn a number of different visualizations. Note that you don't just have to use D3, but can use other visualization libraries as well (i.e. Vega-Lite, Highcharts).
I also want to shout out Mike Bostock, one of the company founders (and creator of D3). I emailed him randomly to ask for some help with a d3 package and he replied the next morning.
Busy creators who nevertheless still make themselves available to engaging with the community always impress me!
He responded to me too, almost a decade ago when I was first getting into the field and looking for mentors. He had some great tips too (basically which was to not do what I was doing privately emailing devs asking for mentorship and instead get involved with the public open communities on via GitHub, Stackoverflow, sharing demos, etc). Obvious advice in hindsight, but took me a while to internalize that.
One of the improvements on the computational notebook concept is that it seems to compute a data flow graph so it knows what cells to update when the data changes, without you needing to re-run cells in a particular sequence. The cells can also be out of order.
The editor interface is pretty slick, and I wish it were open source so I could use it in other projects.
The data flow is really well implemented with Observable. The fact you don't need to go through and repeat the execution of a number of downstream cells when you make a change is really useful.
Another great advantage they have is the data map on the right-hand margin. It's a good way of seeing how complex a notebook is getting as well as being a really efficient way of navigating through the data flow.
I really like Observable and have used it quite a bit, but I absolutely hate that it is the default documentation form for D3, because D3 is a javascript library, and Observable is not javascript. At the very least it would be nice if there was a way to take an example and press a button to get the same thing in javascript.
Yeah I pulled out d3 for a project recently, having used it before–quite a bit at one point–but not for several years. I just wanted to browse some basic examples for the current version, to refresh my memory and familiarize myself with the current API, but the reliance on Observable drove me up the wall. Even a lot of the old examples from bl.ocks.org just redirect to Observable now.
The presentation is one more layer to understand between "here's the code" and "here's what the code produces". I do generally like the notebook format for exploratory data work, but it's not at all ideal as part of a workflow for writing code that will fit into a larger application, in my experience.
A lot of the code you can write does look like JS, though. You can use the libraries that may be common to a JS dev. It's not perfect, but there's enough overlap and the differences are understandable and documented[1].
Also having learned a lot of D3 by reading through Observable tutorials, I'm curious what you've seen that doesn't seem to work when you port it to JS?
One thing I'm aware of is that you have to move a lot of the plotting code into the .then() method of promise when you're loading data.
My gripe with Observable is I can't fit into a Python-based workflow. I.e., a machine learning or gpu-training worfklow. It is a nice place to create explainers, though. I almost think of it as being like a Medium alternative.
For Python, Datapane may solve your use-case (I am one of the founders). It's an open source framework for creating HTML reports from text, dataframes, plots (altair/bokeh/plotly/folium), and files. We provide a free public platform for sharing reports on the web, and you can check out some of the things our users are building here: https://datapane.com/gallery/
It's different to something like Observable in that you import it into your existing Python environment (Colab, Jupyter, Airflow, etc.), build the reports from your assets, and push them up -- as opposed to having to do the analysis itself in our interface.
With the growth expectations that come with the funding amount they just collected, I'd count on it. There is just so much more demand for data visualization and notebook-style projects in the Python community compared to the JS community.
Could you say more about your workflow? Our platform[0] may suit you:
- Collaborative no-setup notebooks: different images with the most popular libraries pre-installed. See each other's cursors, very useful for troubleshooting, pair programming.
- Long-running notebooks: GPU ready. You can schedule notebooks right from the notebook view so you don't context switch. You can close your browser, disconnect from the internet, or shut your laptop and still see the notebook's execution.
- Automatic params/metrics/model tracking: no boilerplate.
- One click deployment: clicking a button deploys a model and gives you a nice "REST" API endpoint you can invoke.
If you're doing anything related to ML and use notebooks, you probably had the above problems. We're building around Jupyter notebooks because in our experience, our machine learning projects were not held back for lack of slick stylesheets.
You can run just about anything that can be run in the browser (all code in an Observable notebook executes in your browser). Though depending on the flavour of packaging/module system a given package uses, you may need to figure out a correct magic incantation to import it.
You can't self-host with Observable. For many, that means it's not a feasible replacement or improvement over an existing literate programming environment.
It is a bit silly to downvote this for not being true, when it is factual.
The Observable runtime is open source, and trivial to integrate into your existing react/whatever webapp.
There is also a vscode plugin that allows for live editing and rendering.
It doesn't give you the ObservableHQ interface, but tbh I like working with files more.
It also allows me to git everything.
I can see the ambiguity in what you've quoted, but it's clarified in the sentence that immediately follows it. It was referring to the "literate programming environment".
The definition of literate programming environment doesn't include the requirement of a GUI.
Knuths original literate programming environment consisted of a command line tool to split files that mixed pascal and tex into the pascal and tex source, to be compiled separately.
Which means that according to your definition, the original literate programming system, isn't literate programming.
I think you're fundamentally misunderstanding the architecture, capabilities and relationships between observable and observablehq.
Not having a button next to each cell to edit it in place, is not going to impact your productivity in any way, when you can just edit the cell in the window to the left.
Wether it's below or left isn't a meaningful difference, and shouldn't have an impact on wether or not it's a suitable replacement or superior/inferior to other notebooks.
Hope you'll look at it again in the future. In a sense it's your loss, so good luck and take care.
I don't argue that you might find a different notebook environment more suitable. That's totally up to taste, preferences, and silly do argue about.
But observable (the reactive notebook technology) can do everything a jupyter notebook can do, you just need to write a bit of communication code to hook up a websocket. (For me sandboxing the ui is a feature not a bug.)
And you can also self host it on a machine of your choice, and edit it locally.
Observable supports direct DB connections [0], but these only work in private notebooks at the moment.
At Splitgraph [1] we're building a "data delivery network" (DDN, like a CDN but for databases) that looks like a big Postgres database. It works really well with the Observable Postgres client in private notebooks -- you configure it like you would for any Postgres connection. For public notebooks, you can use our HTTP API for sending SQL queries directly to the DDN. Here's an example [2] using that SQL-over-HTTP API to plot some Covid data.
The idea is not surprising: using database indexes and a backend to send the frontend necessary data to render on demand. DB and the backend are containerized so the installation, data loading and authoring are all one command.
For creators, we offer D3 and JSON-based declarative primitives that enable creation of pan/zoom like visualizations very easily (e.g. 10s of lines of JSON for a 15-zoom-level vis). You can check out an interactive demo here, which visualizes 1.88 million wildfires: http://wildfire.kyrixdemo.live/
Since big data vis is very use case dependent, I'd like to reference two other tools that might be useful:
Vega-Lite and Falcon author here. Many people ask for scalable visualizations and I think declarative visualization approaches are really promising since systems can transparently optimize data movement and rendering. I have done some more experiments to scale Vega such as https://github.com/vega/scalable-vega but I think what we need is a system that automatically offloads heavy computations to a backend such as https://github.com/omnisci/jupyterlab-omnisci.
I've used Vega-Lite and d3 before and have appreciated both. Vega-Lite seems to be great for rapid prototyping and d3 for really refined, intricate and more complex plots.
I checked out the Falcon documentation on Github and currently don't have a great understanding of (a) what it would be like to "write in Falcon" and (b) what it's intended use case is and how it differs from existing libraries.
This feels close to a lot of our thinking and why some of those pieces were originally written :) We (the Graphistry team) wrote the original Arrow JS implementation to help us bridge GPU visual analytics components in the browser with real-time GPU clusters in the data center, and our backend is DSLs like dataframes to make it easier to do that.
An interesting thing to me here is the layering of DSLs. E.g., SQL enables user-defined functions like filters, that push down to multi-GPU columnar analytics with the rest of the pipeline... and GPU arrow dataframes for zero copy / streaming to combine it all together. People are posting in this thread about 1M rows, but this stuff is built for 1B+. The DSLs means both analysts and devs work at high levels, and underneath, supercomputing.
Fun historical note wrt JS vs Python for GPU: both have different strengths/weaknesses.. but are basically fine-enough long-term, with tweaking. We started w/ proving out JS GPU dataframes on OpenCL to be more open + viz friendly, and after Nvidia liked using our platform, their RAPIDS.ai team spun up to bring the idea in a more corporate controlled & funded way to Python. That's where the community resources are, so we jumped on board, and every month is now quite a trip. GPU SQL, GPU streaming, etc :) JS does inlining and async better than Python, while Python has the data ecosystem, so I've been eagerly anticipating JS folks stepping up where we had to leave off.
It's exciting to see it all come together -- imo, still early days for what's possible!
Thanks for the thoughts. Graphistry looks really slick and interesting. On thing I'm curious about: how do you compare Graphistry with OmniSci and similar GPU-based solutions? You seem to focus on a niche - graph visualizations. That's one differentiator I can tell. Do other solutions support graphs too? If so, how do you compare Graphistry with them?
Btw, if you referred to the 1M-row example I posted - we actually can do much larger than that. A recording of a visualization with 1B reddit comments is here: https://youtu.be/ccES97ni_vI Behind the scenes we do indexing with Citus, which is a distributed version of PostgreSQL. Our cloud budget can only afford hosting a small demo 24/7 so that's why. Also, because OP talked about loading data into the web vis tools - 1M can already break many web tools out there.
You can tell our target use case is different than yours, and than OmniSci's. It's great to see solutions being developed in a one-size-doesn't-fit-all world.
- omnisci SQL. in contrast, rapids.ai opens layers below (cudf arrow, dask, ...) that enable cooperating solns on top (blazingsql, cugraph, custreams, prefect, ...) that are faster + easier for their domains, w fallback to general dataframes/sql.
-- omnisci is governance by a VC co, while rapids.ai is by nvidia (who wants to sell hw, not sw) and more OSS partners
Omnisci did good engineering, so it does have strengths. ex: its geospatial visual analytics means it's a good esri alternative consideration, as it is more polished than manually stitching together cuspatial + blazingsql + leaflet etc. Likewise, commercially polished for hostile enterprise environments (procurement, ...).
re:scale, see rapids tpcx-bb numbers ('big data'), I think on 10TB datasets. it shows scale + cost effectiveness wins vs others. less obvious, out-of-core so can do TBs even on one GPU, and full tpcx-bb needed the above versatility where sql is a kludge.
re:graph vs table, if you do just points and no edges, the node table is just a regular table you can do regular tabular data analysis + viz in. ex: load in samples scored by some ml model (x/y plot w lots of data columns for each point), then connect nearest neighbors to make it into an interactive graph. we are doing more and more here in practice, it's fun :)
I saw you mention Falcon and NanoCubes so had to check our kyrix. I think this is a really important space. I used to assume Palantir had solved this but now that they're public it seems to be more of a smoke and mirrors startup.
3 questions:
1. Why isn't there a demo I can just click and visit?
2. Why not TypeScript? (I think this would be a worthy short term and long term investment —for this project and beyond— to make the upgrade)
3. Why Docker? (This one I'm probably wrong about and am just a curmudgeonly old gray beard). Every time I see Docker I say "nope". I get it's use for massive clusters, but for running on one machine I dislike it (at that point I'd prefer just to run an image on a cloud VM).
1. In the original comment there is a link to a demo created by Kyrix: http://wildfire.kyrixdemo.live/ Are you looking for a different type of demo?
2. Honest answer: we are very understaffed academics who also need to write papers and theses. We want to convert to TS, that is just one item on the wish list.
3. We want everyone to be able to spin up a Kyrix app on their laptop using 3 CLI commands (you can try, instructions are in the README). With a backend and a database comes the cost of complex installation. Docker helps make everyone's dev environment consistent so it's easier to troubleshoot.
1. Ah! Sorry I missed that and went to the GitHub. Might be helpful to also add that link to the "website" field of the GitHub?
2. I get it. I never had the "pleasure" of being an academic, but got the chance to be a software engineer working alongside grad students for a couple of years and it felt like I imagine what taking a tour of a sweatshop factory floor would be like. (a huge exaggeration, but I saw a lot of low pay and lots of time on pdfs and not enough on code)
3. Yeah, I get it, but maybe allowing everyone to run it on their own VM would be better (and providing a one click "click here to get your own droplet on digital ocean running kyrix" sort of thing). I've found that Docker suffers from the XKCD problem (10 different environments to support—what if there was just 1....11 different environments to support!). I've almost never had a pleasant experience using Docker (and anything that requires downloading a 1GB+ image I don't consider pleasant). But again, I'm probably just being a grump and the other grads I worked with all seems to like it. Another idea is could use do this with SQLite? Have a more slimmed down version that didn't require Docker + those dependencies
Thanks for the answers! Very cool stuff and it's a very interesting problem.
I've found 10MB (gzipped) to be a good rule of them. Anything bigger, query a backend. Storing bigger things in ephemeral tab memory can be painful if you refresh browser a lot.
It's a bit buried, but if you click through the link about how today is a big day for Observable, you'll notice it links to an article announcing that they raised their Series A today: $10.5M from Sequoia and Acrew.
I wonder why it's buried... the comments on this thread don't seem to be on a single topic, so it looks like most people have struggled to find the point too
Observable is fantastic, but as others have hinted at, IMO, I think it's trying to lock you in. I want the raw JS, in one click, so that I can build out a visualization locally, just hacking on HTML/JS. Steps to doing this are highly obfuscated when they could be a single button.
Overall you can see its evolution from prior sites- someone spent a huge amount of time demonstrating cool visualizations, then tried to make it all "proprietary". Nothing wrong with that, but why not make people addicted to the service by giving them the raw drug, rather than making them hate you for dancing through hoops?
Again- huge fan of access to many cool visualizations that have greatly improved what I can express visually for presentations etc., but very unhappy that I can't do this so much easier for my particular workflow. I'd love to be shown otherwise, but I've also spent a lot of time looking for answers, reading forum responses, etc., and they seem to all reflect wanting you to go through them, not along for the ride with them.
For instance, I host content on my personal website [0] that is written in using Observable, and in fact the site provides guides to allow anyone to export content[1] and embed elsewhere online.
My content can be entirely under my control if I save my libraries out of the Observable API as a tarball[2].
I genuinely think the differences between Observable and 'pure' javascript are driven by the vision for a more powerful way of using javascript for rapid prototyping, rather than any attempt to make it proprietary. I think these ideas are revolutionary and Observable is one of my favourite working environments as a data scientist.
I also think that Mike Bostock has shown good faith, and dedication to the open source community through decades of work, which he's always given away for free.
With all that said, if ObservableHQ and its API were to disappear overnight, the lack of an editing environment would make it difficult to update existing content and so I think there is space for a offline version of the editor. As far as I understand it, nothing's stopping a third party building this.
Thanks for the links, I completely concur with the goodness of everything you've raised.
To clarify where I'm coming from, my use-case is (fully) offline. I want to mock the visualization in the service then get the raw, static JS offline to be used as a template. I'll write a Ruby wrapper around the data I'm exploring and gradually parameterize the template as I better understand it, and what my data looks like in it. I don't want to dance with details of the JS, I want to dance with details of the data and parameters, so get me out of requiring online hooks ASAP. I know this is possible because prior versions of Observable let you do exactly what I want, but now I get some super friendly/not friendly interfaces that obfuscates what I'm familiar with.
As a separate issue not fully related- Ultimately my goal is a static visualization for scientific publications or presentations (I can export SVG from Observable, so my prior arguments doesn't hold here). Science has a very long shelf-life, in my field we routinely reference papers and their figures 200+ years old. While I absolutely love the idea of "living publications" I also fear life that depends on other's services. If the big people can't play nice (coughing in your direction Google services gone dark, I can think of various examples that have screwed over scientists), then you can see my caution in adopting Observable's all-in-dynamic approach.
I agree that Observable is fantastic, but also wish it was more open.
I'm building something similar called Starboard Notebook[0] that has a different set of trade-offs. It ends up being something in between Jupyter and Observable:
* One of the goals is to build Jupyter how it would have been if it was designed for the web (only).
* It's open source [1], plays nice with git (the format is plaintext), and supports local viewing & editing [2]
* Because of that you can host it yourself, put it on your blog / github pages, anywhere.
* There is little magic, it actually is just Javascript at it's base. This means you can use standard browser APIs and HTML, and when you are ready to "graduate" the notebook implementation that should be straightforward. In my eyes notebooks are only for the first 20% of the work that does 80% of the job for small applications/experimentation (which is often where it ends anyway).
* You can "build the ship as you sail": you can load new cell types dynamically at runtime. This is also how Python is supported (through WebAssembly).
* You can have interop with Python and Javascript which is really powerful. Example: Create a drag an drop form using HTML+JS, then process the dropped CSV file using Pandas and visualize using matplotlib.
I think the support for HTML,CSS,JS is unlike any other notebook, as well as the possibility to change the runtime at runtime (importing by URL new language plugins / other functionality). This makes it great for documentation (with examples you can actually execute), as an output format for automated reports, as a scriptable Tensorboard, as a platform for interactive articles, and for educational purposes (tutorials, homework).
Python works for stuff that was written for it purposefully.
Some python libraries work great: numpy, matplotlib, pandas. But many others are not supported directly and can be installed through micropip, but that's quite confusing! These are the issues with Pyodide currently:
* All python code that is executed is synchronous, which means it can not make requests (or call sleep). You can actually make a request using pyodide.open_url('path'), but that makes a synchronous request which isn't really a good idea for anything but small files. I believe asynchronous Python is possible, recent versions of emscripten support it, but it needs someone to put the pieces together (which is not easy!)
* Some packages are huge without being split up. Scipy is actually the only one that's really problematic, I believe it's around 80MB? It should be possible to split it up (into scipy.interpolate, scipy.stats, etc)
* Micropip is asynchronous, so you get a promise when you use micropip.install('my-package'), but you can't "await" it.
* Loading Python initially freezes the browser for a second or two.. not a great user experience.
* Libraries which are not pure python currently need to be manually made compatible with patches - including torch.
Python in the browser (not just in Starboard) needs more love. This is powered by Pyodide[0] which has been making steady progress, but the project is without corporate backing since Mozilla's change of direction. Perhaps Observable can allocate some of their funding towards supporting Python in their notebooks too through this project? Also consider this a call to action for other contributors who want to see Python in the browser become a reality :)
Tableau offers templated visualizations. You generally do not need to program to create stuff. Observable is a notebook environment where you can write JS (mostly) to create and tweak highly customized visualizations. It's similar to Jupyter on a very high level (they all have reactive cells for example), but for JS and visualizations.
Some more questions:
1) Assuming the bulk of folks use jupyter notebooks for analysis, why not integrate with jupyter or use d3.js in jupyter directly? I assume there would be a learning curve for data manipulation on js where analysts/scientists would prefer python
2) Assuming that this isn't even meant for the jupyter audience, who is it meant for within a company?
I definitely think it's a good point to support Python/R for data manipulation, and allow to also JavaScript for complex visualizations.
Their current target for paid customers seems to be infoviz folks at news outlets (the CTO and co-founder was at NYTimes' data viz team for a number of years): https://observablehq.com/teams
That doesn't seem like a huge market, but perhaps they're starting with this audience segment and will be branching into more DS-type features (i.e. supporting a Python kernel, trying to be a more feature-rich Jupyter).
I mean this is my question as well: how are they making money? As of right now Observable seems like a mix of use cases like Medium-like explanation articles, vis debugging/authoring and interactive journalism that don't seem to generate much money (ofc I can be wrong).
I totally agree with you that the DS community is much much larger than the vis community. With the new funding round I think they should start doing something with DS.
It looks like their current plan for monetization is a per user subscription fee for added features on the notebook: https://observablehq.com/teams
Given the marketing on the page, it also looks like it's targeted at business use cases (i.e. a bunch of data scientists who want to visualize something in d3 but also want to collaborate).
You can install npm libraries. You can connect to DBMS servers (not sure about arbitrary web servers). I believe they offer iframes to facilitate embedding but not frontend components.
Interactive vis is not very well integrated with the frontend ecosystem IMHO. Integrating a very interactive D3 vis with React, for example, can be painful.
On top, it is forkable for others to build upon. It support comments, its reactive. When I explain it to people I say its like a combination excel + vscode + google docs + github. There is nothing like it!
I been trying to see how far I can push it for non- visualization software. Turns out... pretty far....
Firebase + Stripe => https://observablehq.com/embed/@tomlarkworthy/saas-tutorial
On Demand Minecraft Servers => https://observablehq.com/@tomlarkworthy/minecraft-servers
Zero deploy serverless cells => https://observablehq.com/@tomlarkworthy/serverside-cells
With serverless cells I can serve HTTP endpoints from a notebook! E.g. custom HTML like:
https://serversidecells-ibyw6dtm4q-ey.a.run.app/notebooks/@t... is generated from a notebook!
I am so productive in Observable because there is no context switch between development, using the product and debugging, it's all just one thing. Amazing, utterly amazing. I hope the Observable team can accommodate my use cases, as its clearly not what they had in mind.