Hacker News new | past | comments | ask | show | jobs | submit login

> Every cell can contain text, data or formulae; every cell, row and column may be endlessly multiplied and referenced. These two qualities make spreadsheets an indeterminate material matrix — the textured all-over-ness of a Pollock painting. Or the empty space of a desert landscape in whose expansive lines could be written every story.

> Spreadsheets can render scenarios with total variability, but the complexity needed to turn every product, object, idea or structure in a spreadsheet into a twiddlable dial or live display often suffocates the insight in a sandstorm of choking numbers. …

There seem to be quite a few recent tools which try to solve this problem by replacing the grid paradigm with something a bit more structured. The main ones I’m aware of are https://inflex.io/ and https://www.trymito.io/, but there are many more, and I even had a go at making one myself. I’m not optimistic about their chances in general, though. Traditional spreadsheet UIs are immensely flexible, and great for small calculations and anything involving tables or lists. They also happen to be utterly awful at anything even remotely large-scale, but by the time people figure that out, it’s usually too late to switch as the sunk-cost fallacy kicks in.

On the other hand, what are the alternatives? Programming languages require a fairly significant expenditure of effort to learn, and don’t give nearly the amount of interactivity that spreadsheets do. Even environments like Jupyter notebooks, or the MATLAB IDE, don’t come close. Besides, in the hands of the unskilled — and even the skilled, really — programs for data analysis can become nearly as messy as spreadsheets, especially with popular languages like Python and MATLAB.

For these reasons, though I utterly despise spreadsheets, I am also beginning to despair of ever successfully replacing them with something better: spreadsheets are just too convenient, so why would anyone use anything else? Excel is always going to be more convenient in the moment than any more principled tool, precisely because it is infinitely flexible and has no restrictions. People don’t like friction in their UX when they just want to do a few calculations. There is an avenue to wide usage for tools like Mito (linked above), which give programmers a more spreadsheet-like interface, and so integrate nicely into workflows which already exist. But this approach is in itself limiting; I want a tool I can open and use right now, not one where I have to make a whole new Python environment and notebook and so on just to do a simple calculation. Alas, I see no way to get wide adoption, or perhaps even adoption by myself, for any ‘better spreadsheet’ implementation.




Hey, I'm one of the founders of Mito (https://www.trymito.io/). This is a super interesting perspective. I agree with a lot of your thoughts and wanted to respond to a few in particular.

> They also happen to be utterly awful at anything even remotely large-scale.

I think there's a few reasons why spreadsheets struggle to scale to large datasets and complex analyses.

When it comes to data size, legacy spreadsheets like Excel were just built for an age with different data size expectations and its hard to upgrade that monstrous code base. That's why Mito uses Python to make all of the transformations. Python still has limitations, but it works for tens of millions of rows of data.

Complex analyses are the other big cause of pain when using spreadsheets. Specifically, spreadsheets can quickly get super messy when using a mix of tabular data and singular cell results. Once the structure of the spreadsheet loses consistency, it takes a lot more mental effort to untangle the spreadsheet.

These complexities arise because Excel is super un-opinionated about what types of analyses make sense for a spreadsheet and how those analyses should be structured. Because Mito is designed specifically for working with tabular data through pandas dataframes, we're able to make design decisions that enforce a bit more structure into the analysis. 1) All data in Mito must be tabular -- it both preserves the structure of the spreadsheet and fits the ideals of pandas dataframes. 2) Every edit you apply in Mito applies the entire column (or dataframe for ops like filter, sort, pivot, etc.).

The result of 1 + 2 + the fact that Mito generates the equivalent pandas code for every edit makes it fairly easy to understand what transformations are applied to the data at any given time.

In practice, we see complexity explosion is the result of combining data exploration and analysis. In the exploration phase users apply temporary filters, column transformations, etc. But they don't want to take those transformations with them. What is exploratory and analysis work is often not known until after the analysis, so its a hard problem to design for, but its something we spend a lot of time talking about. Our most recent work to address this area of complexity is optimizing the pandas code that we generate. We can use obvious cues like if the user deleted a column or dataframe that they had previously created to tell us that work was only part of exploratory work that they no longer want. As a result, we can safely delete the python code used to create those columns/dataframes.

> I want a tool I can open and use right now, not one where I have to make a whole new Python environment and notebook and so on just to do a simple calculation

I totally agree with this! Even as the creator of Mito, if I have to do some quick ad-hoc analysis, I'll end up opening Excel instead of launching Jupyter and then Mito. We're looking into ways of improving this though! One idea is to create a command like mito <file path> that automatically launches your juptyer server and opens the file in Mito. Another is to add support for Jupyter Lab desktop so you can get closer to launching with the click of a button.

Lastly, I'd love to engage with you more about this since you clearly have a lot of interesting thoughts. If you want, reach out to me aaron <@> sagacollab (dot) com.


I completely agree with your assessment of why spreadsheets fail. Completely unstructured data plus a mixture of exploration and analysis is a recipe for disaster.

> These complexities arise because Excel is super un-opinionated about what types of analyses make sense for a spreadsheet and how those analyses should be structured. Because Mito is designed specifically for working with tabular data through pandas dataframes, we're able to make design decisions that enforce a bit more structure into the analysis. 1) All data in Mito must be tabular -- it both preserves the structure of the spreadsheet and fits the ideals of pandas dataframes. 2) Every edit you apply in Mito applies the entire column (or dataframe for ops like filter, sort, pivot, etc.).

I tend to agree with this too, though there are cases where either (1) or (2) may need to be relaxed. Personally, I think static type checking will also turn out to be useful for structure enforcement: it’s nice to have things like builtin support for units, or defining enumerations for categorical data, or even just making sure that each column has the same type of data throughout. (This is also why I’m uncomfortable with building a spreadsheet on Python, for all the advantages such an approach has.)

> In practice, we see complexity explosion is the result of combining data exploration and analysis. In the exploration phase users apply temporary filters, column transformations, etc. But they don't want to take those transformations with them. What is exploratory and analysis work is often not known until after the analysis, so its a hard problem to design for, but its something we spend a lot of time talking about.

Making a UI good for both data exploration as well as more in-depth analysis is an interesting problem, and I’m not convinced we’ve found a good solution yet. Spreadsheets are good for the former, but not for the latter; programming is good for the latter, but not the former. Inserting a spreadsheet into a notebook interface seems a reasonable compromise, but I’m sure it’s possible to find something better and more tightly integrated.

> Lastly, I'd love to engage with you more about this

Sure, thanks! I’ll send you an email now.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: