Hacker News new | past | comments | ask | show | jobs | submit login
Iodide: An experimental Mozilla tool for data exploration on the web (hacks.mozilla.org)
300 points by pablobaz on March 13, 2019 | hide | past | favorite | 43 comments



It is funny how the world wide web was invented for the purpose of improving scientific publishing. It revolutionized everything but scientific publishing. Scientists still seem to be publishing their research as PDFs with citations written at the end of the page


As an aside, you may be interested in this article from The Atlantic:

https://www.theatlantic.com/science/archive/2018/04/the-scie...

It basically makes the case that things like Jupyter notebooks are the future of scientific publishing.

And here's the accompanying HN discussion for convenience: https://news.ycombinator.com/item?id=16764321


I'd say open access journals are a pretty big change to scientific publishing, and that started with the preprint archives at around 1991 with arxiv.

Beyond that, the web wasn't created to improve publishing, it just did as a side effect. It was started for information exchange within collaboration, and it certainly has done that.

Experimental collaborations at CERN in the 80's were not too large at most around 50 authors (NA31 < 40 , NA32 < 50 ), by the next generation it was up to 350 distributed over 32 locations, and the current collaborations are at around 3000 over 180 locations.

The web is a crucial component of the information systems that allow these collaborations to function at all.


Of course it is, science works differently then 30 years ago. But the output, at the end, is still papers which look not only like 30 years ago, but like 100 years ago!

With Latex, we have a powerful tool at the hand which makes sure every single student can write himself a perfectly shaped PDF which looks timeless.

It doesn't even matter whether your favourite open access journal renders the paper as HTML -- they only took the Latex and transformed it.

It also doesn't matter that we have meta search engines for literature references (for instance http://inspirehep.net/ or https://ui.adsabs.harvard.edu/), or that we do have archives for uploading scientific data (https://zenodo.org/ or in general https://www.re3data.org/): The papers are still non-interactive, still linear (not clickable hypertext).

Science is old fashioned and slowly moving...


The output of science is human knowledge, the printouts is only the transport layer. For conveying results, the fact is that making graphs with sliders is usually not worth the effort: you can always make a static figure that tells the same story, and usually with much less effort spent on making sure your sliders don't show anything distracting.

You seem to object simply on the basis that an existing system is old, do you also find it objectionable that some typefaces we use can be traced back thousands of years?


Just as much as "a picture is worth a thousand words", I claim a movie can be worth a thousands of pictures. Of course it heavily depends on the research field. I recently worked in fluid dynamics where it is natural to watch movies and publications only have still images, which is always worse. Everybody finds them worse but nevertheless the community sticks to an overtaken way of sharing results.


Same for Java - it was invented for applets and ended up being used for everything except applets.


I think it was actually first created for devices (specifically set top boxes) but never really found a home there and then someone wrote a browser in it (HotJava) that supported executable content and it took off from there.

https://en.wikipedia.org/wiki/Java_(programming_language)#Hi...


Instead of being targeted directly at scientific markets, I'd love to see a more generalized 'smart document' version of this. The promise of HyperCard, Glue, OpenDoc and even OLE of compound, programmable documents is something I've been looking for for decades.

I'd love to see someone come up with a path taking something like this or Jupyter and have it target a wide market and extend the capabilities in an accessible way. I think you'd blow half steps like Airtable out of the water.


"Over the next couple months, we added Numpy, Pandas, and Matplotlib"

The python data stack running in web assembly!


And it's also available standalone for other applications!

https://github.com/iodide-project/pyodide

Hats off to the team, this is some really neat stuff and I'm happy to see it coming from Mozilla.


It's impressive how streamlined matplotlib plotting is and how well it works -- you just import matplotlib and start plotting, you can have multiple interactive plots alongside each other. Especially compared to Jupyter, where it's unfortunately sort of clunky (interactive plots first have to be enabled with %matplotlib notebook, and then you can only have one at a time, you have to manually "freeze" the previous one before you start a new one, or weird stuff -- overplotting -- happens).

To be clear, I love Jupyter, and I think it's especially great for teaching beginners (especially because it has "real" cells, as in GUI widgets, though I do get the appeal of a flat plain text format for advanced users). But plotting has been a bit of a stumbling block, whereas the way it works in Iodide blew me away.


This is the future I want: tools so anybody who has the inclination can publish interesting things for anyone to see.

Imagine if web servers were just standard containers and services which anyone could upload applications to. That would take us to a decentralised internet once again. No more centralized, fossilized, web application with stacks that are impossibly complex that only the original developers could change. All the power and innovation would be put in the hands of the content creators once more and the internet would just be infrastructure. Sound familiar? I hope so.

Time to program like it's 1999.


<blink> here we go! </blink>


There is a sense in which this is pretty cool... even very cool. And I'm all for anything that makes scientific / data-oriented analysis and exploration more accessible.

But on the other hand, I remain unsure that it makes sense to continually try to push everything into the browser. Take:

Iodide documents live in the browser, which means the computation engine is always available. Whenever you share your work, you share a live interactive report with running code. Moreover, since the computation happens in the browser alongside the presentation, there is no need to call a language backend in another process. This means that interactive documents update in real-time, opening up the possibility of seamless 3D visualizations, even with the low-latency and high frame-rate required for VR.

I mean, yeah, OK, there are aspects of this that make sense. But there are always tradeoffs. Take, for example this point: "there is no need to call a language backend in another process". This also means that you're limited to the processing power available on your local machine, which is - BTW, being shared among everything running on your computer.

I'm not saying that Iodide is bad, mind you. But it - like any other tool - may not be appropriate for everything. As a corollary to that, I think it might be a fun experiment to see what it would take to provide an ability to move the computationally expensive parts between the local computer and a remote $BEEFY_HOST in a seamless way.


Interesting to see this tool along with Observablehq.


I like that when scientists don’t like a tool they write a white paper complaining about it (the HAL paper linked in this article) =D


> Pyodide: The Python science stack in the browser

You have convinced me to try this now!


The SWE in me is very enthusiastic about this. I love the push that Mozilla is making with WASM in really practical and tangible ways.

That being said, I have a hard time seeing myself moving from Python to JS for any 'real' data science work when I get a lot of the more advanced viz features here for free from libraries like plotly without having to touch JS and with similar or better performance. Not to mention, I don't have any desire to move away from PyCharm after getting comfortable with and really taking advantage of of its full feature set.

I think for small projects this could be very useful, especially with GDocs style real time collaboration, but for now I can't see myself or my teammates taking advantage of this.

That being said, I vote full steam ahead and lets see where this takes us!


If I'm reading the near the end of the post correctly, they actually compiled Numpy, Pandas, SciPy, and scikit-learn to WebAssembly and have one video at the end of the post showing a python-only visualization running in the browser.


You are reading that correctly.


(disclaimer: I work on Iodide)

+1 - this is the right attitude to have! If you have a productive data science team, you should continue to use the tools you have - we'd be the first to tell you that. Re: the JS thing, we've found through user testing and our own experience as data scientists that folks don't really want to use JS for anything, and we want to meet some segment of data scientists where they are. As explained in the post, JS as a language and ecosystem isn't quite there in the way that it needs to be. In a few years, it may be, if you follow along w/ the various TC39 proposals out there.

To clarify, we do have support for Python / NumPy / Pandas / Matplotlib / others with Pyodide, which is natively supported in Iodide with the `%% py` delimiter. It doesn't address all the other workflow issues you have, of course, and there are some differences b/t a server kernel and what we have.


I agree: you’d have to quite literally force me to use JS to do data science stuff.

Surely there’s a better solution here? Maybe something like pandoc but for presenting science stuff in the browser? Do your work in whatever language, write up results and graphs using preferred tools and packages, output some kind of format that can be parsed to produce this stuff?


Iodide does actually support that workflow, both in JS and our wasm python stack (Pyodide). We have some data scientists internally who have made Iodide reports using only pandas & matplotlib. It's still early days for this tech, but it's quite doable.


I want to see APL ported to this, specifically Dyalog APL. APL is such a great math/data language, and with Co-dfns for GPU computing it would be great. Shen [1] would also be cool on this.

[1] www.shenlanguage.org


Feel free to reach out to us on Gitter https://gitter.im/iodide-project/iodide if you're interested in doing this. If you know anything about compilation, it's pretty straightforward, depending on the language & what's happening under the hood. Here are our docs on language plugins:

https://iodide-project.github.io/docs/language_plugins/


I know very little about compilation, but I'm going to look at the link later, and get back to you. Thanks for the links!


I just read the documentation of JSMD, which is Iodide's flavor for Markdown [1].

So they made a Markdown syntax which is capable of running JS and Python, and they name it JSMD.

Just nitpicking but they probably want to come up with a more suitable name other than JSMD. I'm genuinely amazed by this project though.

[1]: https://iodide-project.github.io/docs/jsmd/


(iodide dev here) Yeah, that's a good point. We called it JSMD before we even thought of starting Pyodide, and in hindsight seems a bit constricting.


Thanks for your comment. It is probably not too late to change the name because it's alpha yet. It may be misleading for newcomers, when JSMD tries to support even more languages through plugins.

Other than that I love the design of the syntax, especially raw cell seem so great to me. I'm often supersized by the fact that both Markdown and Jupyter don't have the official ways to insert comments. Fetch cell is also a smart idea.


I like how the first comment at the end of the page asks:

> It’s like JupyterHub, but for people who only know javascript?

That also was my feeling. And since Jupyter is language agnostic, why not just run a JavaScript kernel, such as https://github.com/n-riesco/ijavascript . It looks somehow as if Iodide is exactly that.


Isn't a big difference that all of this is running your code on the front end? (Or am I misunderstanding how these work?)


I was trying to make coroutines work through the javascript event loop, such that you could basically "await" any javascript Promise (or wrap a Promise in an awaitable).

So you'd have things like 'await sleep(1)' which would be implemented with a Promise and setTimeout. Or 'await fetch("/api/v1/list")'.

Unfortunately I haven't gotten very far yet.


My preferred way to do this would be a compiler pass that rewrites everything to use the explicitly managed stack. I had started on writing that transform, but ran out of time. That's currently the major blocker for making the Julia/Iodide integration work well.


I'm pretty sure you can await any javascript Promise!

For example, running `await Promise.resolve(3)` in Chrome's js console prints `3`.


Also MDN has a fantastic Promise article with examples about wrapping a function with a promise.[0]

[0] https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...


It's a little unclear, but I think he wants to integrate Python (Pyodide) coroutines into the JS event loop-- so async/await of Python code works seamlessly in the browser, and you can await Python code from within JS (and vice versa).

Any JS Promise is awaitable from JS; JS async/await is (more or less) syntactic sugar on top of promises.


Sorry, I meant to use "await" in Python to await Javascript Promises.


After having a quick look, I found that they use Emscripten and it is capable of compiling it to JS (asm.js) as well as WebAssembly. So the whole Pyodide stack can be compiled to JS.

I'm wondering how the performance looks like when running Pyodide stack on JS.


ASM.js is not the JS you use. It's follows a different spec, and it's way faster than JS but slower than WASM.


I'm not sure how that matters. It is still JS, it can run on any JS engine even without transpiling. I just wanted to know the exact performance difference between the two different compiled outputs.


What is the difference with Jupyter/JupyterLab?


Jupyter follows Mathematica's input and output cell format. JupyterLab uses split windows, and is closer to Iodide from what I can tell, but Iodide is running in WebAssembly, they ported Python and associated libraries to WebAssembly. I am still looking though.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: