I don't know if anyone will find this useful, but here's something I learned literally this weekend.
Sort rows/cells into groups based on the value of a cell.
=FILTER(Stories!B2:D13,Stories!F2:F13=A2)
First parameter, "Stories!B2:D13" is a group of cells showing some stories.
Second parameter, "Stories!F2:F13=A2" is the column where each cell is compared to the value of A2. Rows that match are then copied into wherever the =FILTER formula is placed.
I use it to take a list of Stories and sorts them into Sprints automatically. That's useful for program increment planning, etc.
The other useful Excel thing I learned recently is:
That says: If the cell A2 is not blank, append its value onto the url given, and show that as a link with the text PROJECT- with the value of A2 appended.
I know I should have probably done something cooler with Emacs and org-mode, but I have to share it with a lot of business folks.
If by some chance either of those are useful to you, I hope they work OK for you :)
Thanks, I really find these examples useful. For me, the QUERY function in Google Sheets has been really useful too. Same thing with IMPORTRANGE. It really helps with documenting what you did to get a count of rows.
Pivot tables powered by SSAS cubes may be the best self serve analytics tool I've ever seen (where "best" is measured by how much users actually use it). The ability to meet business users where they are is huge for actual adoption, especially compared to something like Looker (which is hot garbage for other reasons as well). Plus, 9 times out of 10 people want to apply additional lightweight transformations/calculations to the data, so nothing beats being in Excel already.
Too bad OSX support is non existent and writing MDX is a pain in the fucking ass.
I agree and I do this via Power BI. If you import data into a Power BI report, create a data model with calculated measures (in DAX, not MDX), and publish it to the online service, then users can click on "Analyze in Excel" and it downloads an Excel workbook with a pivot table connected to that data model. I provide this to the PMs for the product I work on and they're able to answer a lot of their questions just by pivoting instead of having to write bespoke SQL.
Oh that's cool - does it work for business users on OSX? I feel like the fall of Thinkpads was the final nail in the coffin. We tried running a dedicated windows machine that people could remote into, but it was just too much friction.
Edit: One tool that looks promising is Equals (equals.com), but I haven't had a chance to play with it directly to see how it compares.
Assuming the company has a Microsoft 365 subscription, Mac users can just use the Excel web client in their browser to access the data via pivot tables.
Power Pivot is the name for the tabular data modeling feature in Excel. It works similarly to Power BI or SQL Server Analysis Services, sharing the underlying database technology. You can load data from queries and files, transform it, add relationships, hierarchies, and calculated measures, and connect pivot tables to it.
I worked at a company that had this (OLAP Cube + Pivot Tables) as one of my first internships and it was mind blowing. I bring this up in terms of amazing analytics tools in the data space somewhat often. Even highly developed BI tools like metabase can't handle dimensions as well as the pivot table.
I used to work on a BI and reporting platform on the Microsoft stack (SSIS + SSAS + SSRS)..
Even got decent at writing MDX.
Thing is, you needed beasty servers to get good performance (event with loads of cube modeling optimizations), but that was rarely the case...
This was in the physical servers era, mind you, even started on 32 bit Win Server, which choked up pretty fast...
I switched company around when tabular OLAP models started replacing multi-dimensional models, so I never got to understand those.
Back in the mid 2000s, our killer app was an example app from MSDN magazine. It basically embedded Excel in a simple web page and we could use the pivot tables against SSAS cubes. We did a little web work for permissions and to save views, but probably had less than a total week of dev work.
Have you tried Easy Data Transform? It is a lightweight ETL tool for doing data transformations (such as pivot) on Excel, CSV and various other file formats. Runs natively on Windows and Mac.
The ETL part is not really the problem business users seem to have, but rather needing the ability to use pre-built measures / dimensions natively in Excel.
It does, but everything is translated to raw sql then pushed to the database layer, which means anything with a meaningful amount of data runs like dogshit. I haven't touched MDX in a long time, but my recollection is that OLAP cubes make a lot of these pivot-table type queries a lot more performant.
Also, while it may seem like a minor thing, not being connected to live source introduces a significant amount of friction and room for human error. Adding a new filter or measure = new copy of a file that you need to keep track of, refreshing with a new month of data = a new copy of a file, etc.
> everything is translated to raw sql then pushed to the database layer
All ROLAP-kind of BI tools do that (including PowerBI when it uses direct-query connection mode), it is expected that underlying data sources are fast enough to handle these aggregate queries very quickly. In fact this approach may be used even with non-OLAP databases (like PostgreSql or SQLServer) and specialized analytical datastore is needed only for really big datasets (BigQuery, Snowflake, ClickHouse etc). In many cases correct usage of report parameters that can filter DB records by indexed columns OR usage of pre-aggregated materialized views, or tuning of SQL query generation (say, avoid JOINs and SQL-calculations when they are not needed for the concrete report) can solve performance issues.
This doesn't mean that Excel's PivotTable (and SSAS cubes) is good and ROLAP-kind pivot tables are bad because their applications are different. In cases when pivot tables should show actual (near real-time) data and this is main purpose of this kind of reports in BI tools; when users need to explore some dataset in a disconnected mode they always may export concrete report's data to Excel - in fact, some BI tools can export their internal pivot table into Excel file with pre-configured PivotTable.
Oh yeah. Any manual data update has potential to go wrong.
Especially since in my exp the most common way to do this is to paste the new data over previous sheet in an excel, and hope all the formulas still work.
Kind of fine, but let’s hope there aren’t any new categories that weren’t there last month!
> Kind of fine, but let’s hope there aren’t any new categories that weren’t there last month!
if it does mess up your formulas hopefully it does it in a way that you actually notice!
hyperbole aside, I don't think it's entirely Looker's fault that business users can't seem to get the hang of it, but I think the delta between what users "should use" and "actually use" is large enough that the tool just isn't worth it.
One of the "crimes" (he types hyperbolically) of Microsoft Office is that is basically sucked all the air out of the room for anything else in the non-graphical artist office productivity area that wasn't Office or a pretty direct knock-off.
The spreadsheet model is a good example (even if Excel is probably the best thing in Microsoft Office. But it also means that if you can't make Word do a good enough job for desktop publishing you generally have to go to InDesign which is probably way overkill if yoiu're not a publishing professional.
To be honest i saw a teacher in high school working on his own textbook)the second revision of an already published textbook) in word and while it had the classic wysiwyg experience, it was typographically okay, almost ready to be printed.
Word, or even something like Google Docs which lacks some features in areas like Section numbering, isn't terrible. I published a book using Google Docs and basically decided anything it couldn't do I didn't need or could handle manually. But, if I were actually come up with a wish list for a low-end publishing platform it would probably look a bit different than Word.
I’d believe it if you count every random plot people make as “design.”
It blows my mind how much people do in excel. On one hand it’s pretty cool how much it can do but you end up with these monster spreadsheets that should really be their own program of some sorts.
One example I think about a lot (because I use the end product a lot) is Fangraphs’ ZiPs model that predicts baseball stats for upcoming seasons. It’s 20 years old and, from what I’ve been told, a massive Excel book with some VB. The creator was a stats major but they didn’t do much programming in his curriculum so Excel was the option he was most comfortable / productive with. Yet, he’s doing a massive analysis over every player in the league using decades of historical data. The thought of doing that in Excel makes my stomach churn but if it works for him then I guess it’s good enough!
Excel is really amazing for doing adhoc type reporting/analysis. I think there is still a gap in the market for a product between Excel and a full blown app with a sql database. Excel falters at larger data sets but I don't find the existing tools in the marketplace very compelling. I always thought there was a lot more momentum in the past with Visual Basic and MS Access. You could build some very interesting tooling that bridge the gap between an Excel workbook and a full blown custom application.
These days I hear a lot about Tableau but the few times I have tried it out I was not a fan. It definitely handles larger datasets that Excel cannot but everything feels hidden behind menus that are unintuitive. In fairness I have never spent the proper time to sit down with it and maybe this kind of workflow works really well for business users.
Never knew Improv existed. There are tools that are similar to the vision of Improv. I use Anaplan at work everyday and it is exactly what a modern cloud based SaaS version of Improv would feel like. It is a multi billion dollar company and it worked because it did not go after the spreadsheet space but played along nicely with it.
The Improv article concludes "the key strategy mistake was to try to market Improv to the existing spreadsheet market. Instead, if the product were marketed to a segment where the more structured model was a ‘feature’ not a ‘bug’ would have given Lotus the time to learn and improve and refine the model to a point where it would have satisfied the larger market as well." and Anaplan seems to not have made this mistake. They have carved out a niche in the EPM (Enterprise Performance Management) market.
Improv's model was poor though, it was based on cell positions and it was easy to double-count. Basically was bad at various semantic aspects. I worked on a small conpetitor back in the day and we competed on modelling, there were a number of others too.
I've also worked on Anaplan and and its modeling is also much better, so please don't lower it to any version of Improv! It isn't, or at least wasn't a little while back, good at cross-metric ("line item" to them") calculations and presentations, but still easier to get something complex correct.
The real mistake of Improv was that it wasn't 1-2-3 so Lotus didn't know how to narket it or sell it.
Instead, pivot tables are more analogous to crosstab queries which creates summarized vertical columns. It "pivots" data groupings by rotating from horizontal to vertical.
The older versions of SQL dialects that didn't have the newer cross tab syntax required convoluted CASE syntax to "simulate" pivot tables which didn't really work that well since one had to know ahead of time -- all the unique values -- to put in each CASE condition branch.
If you are grouping over them, they shouldn't be columns in the first place, the original SQL is "right". Rows and columns aren't symmetric!
Reshaping data should be a presentation-time decision, not a query-time decision. You have a dataset, a relation in the algebraic sense, and you are choosing to display it in some way: as a table, as a pivoted table, as a pie-chart...
Conflating the two is a consequence of spreadsheets having an unbeatable UX but a terrible data model that lets you treat rows as columns and vice-versa.
For business analyst purposes, the reshape should be a display-time activity. Finance models may start with billions but tend to present a manageable quantity to the user.
The chain of calculations back to any stored data is the real key, along with what your interactivity needs for multi-user update are.
Cross tabulations are a fundamental part of statistics, business and scientific analysis. That they aren’t supported by the relational model just means that an RDBMS is not a full fledged analytical tool.
See my comment above on multidimensional axes for presentation vs primary key components of a relation.
More interestingly, SQL has a hard time with the contextual calculations (actuals come from this table and are aggregated, forecasts come from those 12 tables and are computed by the following formula series, now apply that logic across N variables for actual and forecast) and a really painful time defining computed rows (like fields in some rows being ratios of the same field in other rows).
That's what the modeling tools like Anaplan, Tidemark and some others excel at.
You're falling into the trap of rows and columns for visual layout vs tuples and fields for semantic layout. The axes of a multidimensional spreadsheet (there can be any number) are better thought of as primary key components and the cells of the sheet as non-key attribute values.
> should be a presentation-time decision, not a query-time decision
Agree, but sometimes you just need to shove the results of a SQL query into an Excel file and you don't want to get fancy.
You're either 1) overwriting Sheet B and then using a pivot table in Sheet A to get the final presentation, 2) pivoting in the code/program executing the query before writing to Excel or 3) pivoting in SQL and skipping the code and Excel pivot table altogether.
I run into this a lot with data used for financial modeling, or financial reporting that heavily relies on using dates/categories as column/row headers.
Goupby may be the higher abstraction, but it's not necessarily better.
Dimensional models are basically modules (generalised linear spaces), a groupby can do the same things but doesn't really give much useful structure to work with (at best the result is ordet independent, most of the time).
This is also why sums and counts tend to be more useful than averages.
They're different. Group by only gets you so far. Actual modeling of business scenarios deals with a lot of irregularity and the specialized spreadsheet-like tools were mostly much much better at expressing this.
I have not used spreadsheets very often since the mid-1990's. One of the reasons: I was excited by the potential of Improv, but disappointed when I realized that it had no future. Little did I realize that pivot tables were a different take on the concept!
(The other reason for abandoning spreadsheets was performance. I forget how good/bad Improv was in this respect, but I doubt that I would have stuck with spreadsheets since the data sets I was dealing with weren't really appropriate for them.)
I remember going to the UK launch of Lotus Improv as a newbie journalist. It was really notable how both how flashy and professional it was compared to other products (in retrospect, I'm presuming that was Jobs' influence). Nonetheless, I think they really struggled to explain pivot tables, and why, in itself, that feature was sufficient to move to a new application and a new hardware platform.
Improv included an excellent animated presentation which explained its features perfectly. The problem was to get new users to sit their asses down and actually watch it.
In hindsight, this was obviously a UX failure - but I equally blame users who lacked any attention span.
You may want to read about DataPivot, developed by Brio Technology during the same period as Lotus. Brio had a patent on the pivot data aggregation algorithm, I'm not sure how Lotus's pivot table worked...
I use postgres in the role of a better* spreadsheet.
Huge asterisk: Better is a very subjective, Ad-hoc data entry is terrible by comparison, but bulk operations are much nicer. The real reason for the change was that I was starting to loathe the spreadsheet data model. I am not fond of how easy it is screw up data in the big bag of cells data model that spreadsheets offer. The core feature I really wanted is row level security, for rows to to stick together.
The missing part "better spreadsheet" for me is some standard, simplified reactive evaluation declaration syntax. I'm probably wrong, of course.
Oracle has a pretty crazy feature where it handles a query like a temporary spreadsheet where you can enact reactive queries upon [1], that I haven't found in other SQL DBs.
I know you can do that using window functions + recursive CTEs but under this "reactive evaluation of a sub-query" use case they tend to get ugly and incomprehensible real fast.
The linked article specifically mentions Lotus Improv as the app that had this functionality in it. Interesting how Steve Jobs was able to get Lotus to make it a NeXT exclusive app initially.
I remember in Google Ads, we’d build all these cool looking dashboards but whenever we did user research it was pretty apparent that all they wanted was their data in a pivot table…
All these cool-looking dashboards are just too inflexible. You cannot add your own aggragates beyond trivialities. You cannot just "color that one value that bugs you". You cannot just generate a readable report plus some explanatory text.
Spreadsheet export + pivot table gives you all that. Doable for any moderately competent office drone without a round-trip through some endless backlog-spec-sprint-program-test-respec-sprint-... loop
To be somewhat constructive: What you rather should have done is not create more elaborate dashboards. What imho the world needs is an easy way to use a spreadsheet tool to generate and publish a dashboard. A "make web dashboard" button right next to the print button. With auto-updates when input data changes of course.
Yes, I have. What Excel is still lacking is an easy solution for the input side. You can bind tons of data sources, but all are weird, hard-to-use, manual. There is no easy "grab this from that website, get the current data of what I just pasted there, mash it together, publish it"
Hell, it cannot even do proper CSV import. You need to reformat your CSV to match the locale Excel is running under!
Uh? Are you sure you've actually used Excel? The CSV import is highly configurable and leads you immediately into Power Query where you can massage the data any way you want.
The LibreOffice CSV import is configurable. The Excel one isn't.
You can do things in PowerQuery, but that is far from obvious and still buggy. Not to mention all the woes after import, like date/time auto-interpretation and autocorrections that cannot be switched off.
I stand by what I said. Excel imports are a huge mess.
Would you like me to send you some online tutorials on how to import CSVs into excel? Because at this point it's just crazy. Are you using excel 2009? Do you not know about the "Data" tab in the ribbon? There's a whole dialog to complete with several options when you import a CSV file.
That's fair, but fortunately I'm not planning on doing either. (Well, I am still implementing ~all of Excel's formulas for compatibility, but not the the UI/UX...)
People don't really consume data, they read documents. I think that's (part of) the vision these projects lack.
Yeah, I misread the post I replied to: I’ve been on a bunch of internal dashboard projects that were in danger of losing focus and turning into full-fledged visualization platforms.
There's a place for well-crafted analytics dashboards in today's business, too. They're mostly tailored to specific user requirements/use-cases and look nothing like the flashy stuff one sees on dribbble or elsewhere.
Tailored analytics dashboard can solve many pain points of Excel + Spreadsheets if done well.
If ~1k people need to access the same data each day and 'analyze' it for similar things (patterns/outliers/seasonalities etc.) then a good dashboard will be quicker, better and cheaper than 1k office workers trying to create pivot tables.
If that dashboard is tailored to the use case, then those 'color that one value that bugs you' can oftentimes be implemented within minutes after hearing a good use-case from a user. I say that from experience.
And from experience, I'd say that most Excel users know the basics of basics. I'd bet that 90%+
The problem here is that you usually do not have ~1k users with all the same requirements. You have 200 groups of average 5 users each, all with their own department-specific, country-specific or workflow-specific requirements. Of course a central solution will be better and cheaper. But it will never be quicker, because you will take ages to just gather requirements from all 200 distinct user groups. As soon as you have those requirements, they will have changed already, so you are working on yesteryear's problems.
And of course, given a working system, the users can drop you a quick email, explain their problem (yes, in an ideal world they could do that, and you would understand them right away...) and you implement a 5min change. In reality however, their problem will first have to be specified in a user story, with a ton of clarification requests until the story is really understood by the dev team, then you need goodwill, time and money for the implementation. And maybe their problem can only be solved by an ugly hack, a weird special case for the ternary currency and ages-old lunar-calendar-based tax-system of lampukistan. Would that really be quicker than just the lampukistan team throwing together a few formulas and be done faster than the initial email? Even when multiplied by the special requirements of the other 100 country sales teams?
Also, I've had similar change requests where is was explicitly asked to provide a spreadsheet prototype of what the statistics should look like. Well, thanks, why again do we need a dev team?
I know that spreadsheets suck. They are ugly, undebuggable hacks, always and without exception. You need tons of time to implement in hours what would be a quick one-liner SQL query. With terrible error behaviour, weird edge cases and hell knows how many hidden bugs when the locale uses the lampukistan-currency-separator instead of a decimal dot...
...Except that they provide those office drones with velocity, which, as the usual wisdom around here goes, is everything.
Basically the users were trying to tell you (sounds like you never got the message) that your dashboards were insufficient at answering the questions people had to answer in their workplace by their bosses every day.
Nobody cares that they looked cool (highly subjective, anyway) if they can't be used to get work done. Where your team thought you were adding value, you were just wasting time.
According to the CEO, Microsoft summoned him to Redmond and made an insultingly low offer to buy the company. They threatened to build their own product and put Brio out of business if he refused. He refused and MS later added pivot tables to Excel.
The article mentions Lotus, who apparently built a similar product around the same time. The Brio founders were involved with a company called Metaphor, and maybe some of the ideas were developed there...
Brio products shipped with a sample database, the contents of the CEO's wine cellar. That exact data could be seen on early Microsoft office boxes...
But Stac had patents, lawyered up, and sued Microsoft for patent infringement. Microsoft didn't lie down, but after a court ruled that both of them needed a license to ship products, Stac went to the largest OEM customers and offered a license. It was either take the patent license or not ship, and the pressure from the OEMs on Microsoft resulted in a settlement that was not terrible for Stac.
This is one of the best examples I know of how software patents can be good for competition.
The settlement was alright, but Microsoft's inclusion of DoubleSpace still inevitably killed Stac (and even larger potential profits) as they were unable to pivot when the technology became unviable.
> This is one of the best examples I know of how software patents can be good for competition.
Small companies and startups with products in the market (before existing incumbents have products) should be able to get software patents to defend their innovation.
Big companies should be able to have patents, but not use them against smaller players.
Patent trolls with no products in the market shouldn't be allowed to have patents.
Universities and university researchers should be able to have patents, but they should be forced to license at not-unreasonable terms to startups.
My thoughts on the matter, anyway. The point is to encourage innovation, especially by enabling small players bringing new stuff to market. Give them a small shield against the big incumbents.
As somebody who's been at this for a couple of decades, this strikes me as a rather well balanced view. What I can't understand is how no patent reform discussions have anything like this kind of balance. It's all "stronger patents" or "weaker patents." Nobody seems to understand how and when they work vs. don't work.
Arguably this, and similar cases, is what motivated Microsoft to become an aggressive patent filer. I know that at various times in my tenure there (99–13) that I was on teams where patent filing was encouraged
A similar story could be told about Steve Jobs and Apple. Anecdotally at least, Steve hired the patent litigators (who had represented Apple in a patent infringement lawsuit with Taiwanese MP3 maker Creative Technologies over the iPod) to patent the iPhone. But that was more targeted than the spray and pray approach.
Brio had a patent on their pivot table aggregation algorithm, perhaps Microsoft did something differently in their implementation. I've never used an Excel pivot table.
OK, fair enough. We don't know how long it took to get approved. In theory they could have gone after MS. I vaguely remember that our rival Business Objects had some BI patents and sued us, and we sued them over pivot.
I am completely on board with the idea of Richard Stallmans of abolishing software patents BUT it is simply an ideal that I do not see ever happening on a large scale. The fact is that when it comes to business, they can help. It is just a shame when there is the potential to leverage them against smaller players.
Things that work on one scale for good can also be leveraged for bad on another scale.
I think it is worth noting that these patents weren't good at all for competition. Stac Electronics proved to be in no way capable of competing with MS Excel, pivot tables are technically not that hard and the lawsuit might have stopped MS bringing them to a huge number of people faster.
They did nothing much, and if they'd done more they'd have retarded progress.
Oh they made a disk compression program. Happily, the argument stays almost exactly the same - substitute NTFS for MS Excel. The reason I like that anti-patent argument is because, as far as I've seen, it applies to literally everything in software outside some ultra-niche applications of software that only handfuls of companies use.
The patents from that era were largely junk (famously, Amazon had a patent on having a button that you clicked to buy something). They're all for basic techniques that were going to get figured out one way or another. They were most useful as a tool for stamping on other companies that were figuring out the same techniques at about the same speed for the same reasons. IE, were no use at all for spurring innovation. The idea that MS should owe some company $100 million because they used a specific compression algorithm is serious because of the amount of money it cost them but otherwise stupid.
What part did MS steal? The pivot table concept (as in summarising data by categories of rows and similar operations) existed for a really long time, so I'm guessing you're not referring to that?
You could say they just reverse-engineered Brio's product, and therefore it wasn't theft.
But the presence of the Brio sample wine DB appearing on MS Office boxes showed that they bought a copy of DataPivot and used that DB when developing their clone. They didn't even bother hiding the evidence.
BTW, DataPivot was released in 1991, and the only other references I've seen for prior art are a Lotus product released about the same time.
It's not clear that you understand the question you are answering. The "product" being referred to here is a centuries-old concept. There is nothing about pivot tables themselves that Microsoft would need to reverse engineer.
The comment you are responding to is not challenging your use of the word "stole" — it is asking you what exactly MS is alleged to have stolen, reverse engineered, or otherwise gleaned from your former employer.
Lotus Improv actually predates Brio's product. I remember clearly when Improv came out -- it was a really exciting breakthrough at the time. But the big problem IMO was that it wasn't part of Lotus 123. So when Excel added pivottables directly into the spreadsheet that made it vastly more useful (to me at least).
Despite being an engineer and well versed in SQL, the ability to do things quickly in Excel, including whipping up pivot tables, has been a big boost to my career, especially as I moved more to product and business management.
Pivot tables let you super quickly see how things break down across dimensions and play with that analysis in a way that makes for rapid decision making that's not matched by much else other than established tools for well-defined spaces.
I feel the opposite. I rapidly upload my data to bigquery and start slicing.
I can fire out SQL queries way faster than I can click around in excel... and it is reproducible since it's easier to copy-paste a query from history than redo all formatting in an excel sheet later on.
I can see how all pivot table capabilities can be replicated via SQL but I think some of that becomes very expensive and error prone while being simple drag-drop in Excel/Sheets.
The ease of doing it is a key feature. if I have to build a certain report because it's my job, I will do it whatever it takes. If I am just doing extra due diligence for myself, I may not do it if it takes hours of SQL crafting.
As much as I don't like working in R, the pivoting/reshaping functions in tidyr are hard to beat. It's much more ergonomic than the similar functions in Python
Weren't those functions moved to tidyr? Anyway they should be taken as a model for how to implement a sane interface for data reshaping in other languages. You can say what you want about R/Wickham, but he's not afraid to thinker with his modules until they find the perfect user experience
Yes, the newer "pivot_wider" and "pivot_longer" are just much cleaner and nicer than the 25 different half-broken way you can use in pandas. Luckily I don't have to do a lot of data engineering anymore, because R is just not fun for large projects while pandas is very unergonomic
I did and honestly both ways I tried (pivot and melt) looked disappointing compared to tidyr. I just don't like the reliance on indices (and the stateful API that comes with it) in many operations, like in pivot
Back in the days I recall there was some plug-in to query Pivot tables in postgres, but it was such a pain to use that a two step approach (query the column names, and then generate a second query based on these column names) with Python/SQLAlchemy glue code was easier and possibly even more performant. I hope the situation has improved, as indeed pivot tables are so useful
>> Smart folks like duckdb realize the utility (and the pain of doing this in normal SQL)
I used to love the Microsoft Access visual query tool. Super intuitive but maybe a little too abstract for normal people. It would also produce SQL which was how I learned a little bit of that.
It's a drag-and-drop UI for grouping items in a table by category. Plenty of quick videos on YT since business users generally struggle to grasp it. Here's one example: https://www.youtube.com/watch?v=qu-AK0Hv0b4
It's paradoxically very useful and complete garbage at the same time, IMHO
It's just that the limitations feel like whoever was implementing it walked away from it before they were finished to work on a different project. It could be much better than it is. It's way too brittle and the UX for more advanced aggregations is downright painful.
Ya… I fee that. The only high quality ones on the web too are proprietary. Im using the plotly react pivot that gets the job done, but doesnt even compete with excel… 2010 edition . Would love a killer end all pivot integration.
I have no idea what the ideal UX is, i just know that whatever I got aint it
this probably is more about specific implementations than pivot table in general, but I often find it's "lacking" when wanting to do some more advanced things with it.
To name a few:
The way how the "values" are generated is very limited in both Excel and Google Sheets (in different ways).
The way how filter/sorting works with pivot table isn't the most straightforward or flexible.
the "UI" elements (headers, styles, etc.) is very hard to control. I often find myself creating a pivot table, and copy it somewhere else, and manually fix bunch of stuff -- which kinda defeats the point of it (since I can no longer dynamically update it).
Disclaimer: I'm by no means a spreadsheet expert so I may just miss something.
I’m pretty close to a spreadsheet expert, I categorize pivot tables as just quick and dirty adhoc/disposable data visualization and never would use it as something for presentation or where styling mattered.
I usually use it to quickly find unexpected values in the underlying data columns. Although with spill formulas like =unique() I’m using it less and less for this.
I find often people wanting more out of it, are really asking for Power Query. Which is there, just a lot of people are intimidated by. (Maybe not HN, but general population)
Agreed, current implementations are totally brittle and far from "feature complete" relative to what an experienced / power user would expect coming in
It actually bothers me so much I've decided to write my own spreadsheet engine. This is one of the pain points I want to fix
You may try these online pivot tables https://www.seektable.com where you can re-order rows/columns simply with a click on the header, apply filters via simple input where you can specify which items to keep or exclude.
The second most powerful is probably the Solver, that few programming languages (like prolog) include equivalent functionality out of the box. That's one thing that consistently gets me crawling back from Python/Pharo to Calc.
It seems no one mentioned this. Joel Spolsky had this famous "lecture": You Suck at Excel with Joel Spolsky [1], where the talk about pivot table starts at 50:41 [2].
If you've never really grokked spreadsheets like Excel, this is a great place to start.
Although I don't have Office installed, most of what Joel demos can be done in Numbers if you are so inclined. It's not nearly as powerful as Excel, but for my needs it's capable enough.
As regards Pivot Tables, I think Numbers had them at some point early in its history, but it wasn't 1.0 - Notably, Apple's implementation was significantly different than Excel and people complained, loudly enough that it was rewritten to be more compatible with MS in a later version.
The elephant in the room is (unsurprisingly) SPSS, which I had the unfortunate luck to encounter in my Research Methods class during grad school. Since that time (2010-2012), many of the other frameworks and tools have emerged to either fill in gaps or do the kinds of analysis that even Excel isn't well suited for.
I like pivot tables. When I was developing my own database technology, I found that it could create pivot tables very efficiently. I put a lot of work into refining that feature; but almost everyone I show the demo to does not seem to see much value in it. What am I missing?
Here is a short video where I use it to analyze data within an 8 million row table:
I'm building an open-core spreadsheet currently, and pivot tables are by far our number-one data transformation feature. By like 3x-1x. This is compared to even formulas (shockingly! but this is also a function of our audience).
In the hundreds of data-science-in-Excel files I've seen, I can't think of a single one that doesn't make use of a pivot table in some way.
I was working at Lotus when Improv came out. From what I remember, it was popular but as you started adding dimensions memory use blew up. This was in the early days of Windows and 4mb of memory.
I don't remember Brio's pivot tables "blowing up" per-se, but I suppose computers had a lot more memory by the time I joined. They used a patented algorithm to create the pivot structure by aggregating a result-set from an SQL query.
I once read about several companies whose entire product was an Excel addin that did ... pivot tables. The average businessperson had never heard of the pivot table functionality built into Excel, and they were an easy sale for the magical demo of a pivot table on their own data.
Great feature but also a very dangerous one. Few cases of people sharing a spreadsheet with an innocuous summary pivot table, not realising they also, in the process, share the full dataset that supports the pivot with sensitive data.
I am regarded as a decent Excel user, and in my field, Finance, that is saying at least something. However, I had never tried power pivot until today. I was a bit annoyed that it can't handle many to many relationships. Time to dust off my SQL skills and do it the old fashioned way...
I had a table of sales transactions, and a table of stock balance. I wanted to join them on the item sold so per item I had stock balance and a sales value per sku. I was suprised it wouldn't do it. It returned in less than a second as a sell query.
The workaround for many to many relationships isn’t too hard.
What you have to do is create another table containing unique values of items sold, and then make 1:many relationships from that table to the other two.
You can easily make the unique value table by copying and pasting all of the items sold into a single column on a new sheet, highlight them all, and then Data -> Drop Duplicates.
It’s a little annoying, but not hard.
But I have line level data that needs summing in the sales table. The workaround is one table of stock, one of sales and a sumif on the sales to get the stock. But now I don't have a pivot table
That’s what Power Pivot is for. Once you create the additional table and relationships as I described, you should be able to create a pivot table using data from multiple tables. No SUMIFs required. Unless I’m misunderstanding something.
Pivot tables are amazing to hand off to competent users to let them poke around at their data, but sometimes I would not want them to swap rows and columns because it would wreck some calculation I was doing. So, I used some vba to lock down the certain functionality. I think this page has some of the techniques I used: https://spreadsheetplanet.com/lock-pivot-table-excel/
In Excel, you can have the source data for the pivot table be an SQL query -- then the pivoting is just for presentation purposes when you get down to the very last layer of groupby's, so to speak
You can also use SQL queries as sources for regular tables in Excel (i.e. no pivoting)
One of my first roles out of undergrad was to take reports that would take an analyst literally days or weeks to generate into a set of SQL queries that got them very close to the answer with pivot tables (or just tables). Their job went from working on reports to changing a couple parameters in the queries like "current month" or "forecast version", which only took a couple minutes, leaving them with plenty of time to think about new reports to generate, how else to improve current reports, etc. Still one of the most personally satisfying things I've ever done.
Granted we can't expect all business users to learn to write SQL (spoilers: they never will). I believe we collectively need a more robust solution than having an SQL angel come by and write queries for business users... it feels obvious and within reach, but no one has really done it yet.
If only generating pivot tables in SQL were uniformly easy. Postgres's "crosstab" is awkward at best, forcing you to specify column names every time makes exploration unpleasant. In the end I found it easier to do crosstabulation in the UI (or at least, not in SQL).
common use case: get a metric truckload of data from a column store binned per minute (or 5), give the giant sheet to users, users use pivot tables to create hourly/daily/weekly reports as they see fit. (not all users are dumb it turns out)
Well, another nice thing with pivot tables, in excel at least, is I can just follow the data connection to its home and do whatever I want with that. RBQL in rainbowcsv is my usual.
I use pivot tables all the time. The concept is brilliant, but the Excel UI leaves a lot to be desired.
At first, you're amazed at the flexibility, but once you become comfortable, you suddenly hit the limitations. Can't sort by a calculated column, can't categorize without adding columns in the data, etc.
I looked at Quantrix for a while, and it was a bit too complex for practical purposes. I wonder if there are any decent PivotTable tools out there?
I once wrote a time-and-attendance app for a small company I worked for (using Rails 2.x). I put an export function in it, so that the office manager could dump the week's hours, pull it into Excel, and run pivots on person-time-per-project (to pay employees) and project-time-per-customer (to bill customers). After a few weeks, it was clear that the office manager wasn't "getting" how to do this, despite a couple of explanations, so I removed the export, and simply generated both pivot tables in a page. This solved the payment/billing problem so well, that, last I heard, they were still using this application 8 years after I left.
I found it too cumbersome, limited and not very intuitive to do it in SQL with the PIVOT operator so that I created my own API to be able to build a pivot table from a simple SQL statement https://github.com/squashql/squashql-showcase/blob/main/TUTO.... Roughly speaking, it consists in a single SQL statement and a list of headers for the rows and columns.
It is probably just a side effect of how I read the docs, but every time I need one of these I have to re-learn how, and for some reason the tooling in Excel always wants to do some 'other' orientation of my data. If I consider my data as a set of rows (instances) against columns (values) it somehow winds up mis-interpreting the values as the instances.
Mind you, this also happens when I select regions of cells for charting. Maybe order of selection matters.
One problem with pivot and its reverse operation is the poor terminology surrounding them. In addition to pivot itself, people used unstack, stack, gather, melt, reshape, transpose, … A single reversible concept like pivot_wider and pivot_longer from tidyverse, really helps clarity and I wish other libraries also start using this better terminology.
Pivot tables are powerful because they operate on the principle of multidimensional analysis, an intuitive framework for measuring (facts) by categories (dimensions). Facts and dimensions are abstracted into an organized calculation model also known as cubes or OLAP or star schema.
> Rather than enter formulas, users would be able to point and click to get those summary statistics. The Lotus team called this tool “flexible views,” but today similar tools are called “pivot tables” in both Microsoft Excel and Google Sheets.
> The Lotus team showed Jobs an early prototype. “Steve Jobs thought it was the coolest thing ever,” Salas, now a professor at Brandeis University, tells Quartz. Jobs then convinced Lotus to develop the pivot table software exclusively for the NeXT computer. The software came out as Lotus Improv, and though the NeXT computer was a commercial failure, Lotus Improv would be hugely influential.
Joel Spolsky had this to say about Improv though [0]:
> When we were designing Excel 5.0, the first major release to use serious activity-based planning, we only had to watch about five customers using the product before we realized that an enormous number of people just use Excel to keep lists. They are not entering any formulas or doing any calculation at all! We hadn’t even considered this before. Keeping lists turned out to be far more popular than any other activity with Excel. And this led us to invent a whole slew of features that make it easier to keep lists: easier sorting, automatic data entry, the AutoFilter feature which helps you see a slice of your list, and multi-user features which let several people work on the same list at the same time while Excel automatically reconciles everything.
> While Excel 5 was being designed, Lotus had shipped a “new paradigm” spreadsheet called Improv. According to the press releases, Improv was a whole new generation of spreadsheet, which was going to blow away everything that existed before it. For various strange reasons, Improv was first available on the NeXT, which certainly didn’t help its sales, but a lot of smart people believed that Improv would be to NeXT as VisiCalc was to the Apple II: it would be the killer app that made people go out and buy all new hardware just to run one program.
> Of course, Improv is now a footnote in history. Search for it on the web, and the only links you’ll find are from very over-organized storeroom managers who have, for some reason, made a web site with an inventory of all the stuff they have collecting dust.
> Why? Because in Improv, it was almost impossible to just make lists. The Improv designers thought that people were using spreadsheets to create complicated multi-dimensional financial models. Turns out, if they asked people, they would discover that making lists was so much more common than multi-dimensional financial models, and in Improv, making lists was a downright chore, if not impossible.
I don't know if pivot tables are cool; one big problem that people often overlook is that they have to be recalculated ("refreshed") manually; this can lead to significant errors.
Conditional sums are not "complex formulas", they are often easier to understand and debug than pivot tables -- and they are recomputed with each change, which will eventually save your ass.
It would be interesting to know why Microsoft was so incredibly stubborn to not allow to filter by colors / collect value of color in a cell.
I am very, very aware that this is the wrong way to do things, but users really prefer to take their list and color their wrong cells red and the correct cells in green instead of using a separate column for "status".
Of course a separate column with status then allows to have the 99999 different type of statuses that are created.. but is just clunkier.
And I know that you can now (after how many years?) filter by colors, what again is clunky if there are multiple colors, but you cannot get the cells color without custom VBA.
A simple "cellcolor()" formula would allow to make faster color filters.
It is very funny that every organization seems to have a "database" which is a list of stuff. And "big data" is when this list does not fit to Excel anymore.
this is unfairly downvoted, IMHO -- i doubt it will take "zero clicks" -- we're more likely replacing clicks with prompts -- but LLMs can definitely help bridge the gap between data wrangling and presentation.
replies to this comment seem to overvalue the challenges without acknowledging that those may eventually be overcome.
search-driven analytics is not really a new thing and products in this space were before LLM-era. This kind of interface can be useful for some categories of users but it is not a game-changer - prompts cannot replace Excel and its pivot tables, and in fact typing prompts may be even more complicated for users than good old 'clicks'.
Dunno, seems to me that asking (writing in this case) is not final solution. In fact, in most cases, typing down a question will take longer than looking in a well created table.
This chat-like back-and-forth, will take forever to understand some dataset properly. One question answered will give birth to 5 additional questions and so on.
Sort rows/cells into groups based on the value of a cell.
First parameter, "Stories!B2:D13" is a group of cells showing some stories.Second parameter, "Stories!F2:F13=A2" is the column where each cell is compared to the value of A2. Rows that match are then copied into wherever the =FILTER formula is placed.
I use it to take a list of Stories and sorts them into Sprints automatically. That's useful for program increment planning, etc.
The other useful Excel thing I learned recently is:
That says: If the cell A2 is not blank, append its value onto the url given, and show that as a link with the text PROJECT- with the value of A2 appended.I know I should have probably done something cooler with Emacs and org-mode, but I have to share it with a lot of business folks.
If by some chance either of those are useful to you, I hope they work OK for you :)