Hacker News new | past | comments | ask | show | jobs | submit login
ExcelCompare: Command line tool and API for diffing Excel Workbooks (github.com/na-ka-na)
59 points by jsvine on May 9, 2016 | hide | past | favorite | 15 comments



I work at a certain custodial bank and it's scary just how much of our business depends on Excel VBA. Something like this would be amazing as we have virtually no SCM / revision control for our critical spreadsheets.

Does anyone else know of any neat "can't-live-without" Excel utilities or add-ons that would make a software engineer's life a lot easier? A few weeks back someone posted ThingieQuery[0], which also looks fantastic. The IDE for VBA is awful, and I'll take any improvement I can get.

[0]: https://news.ycombinator.com/item?id=11583488


If you think the IDE for VBA is awful, get ready for the tooling for Google Apps Script. I happened to be reading the first few chapters of a new book, "Going GAS: From VBA to Google Apps Script"[1]. He praises the VBA IDE, much better than the thing Google offers for GAS. But he sees the writing on the wall, and that VBA's days are numbered.

"Why Transition from VBA?

In many ways, VBA has been a victim of its own success. Its tight integration with Office and very usable and immediate development environment make it hard to beat. However, it’s been with us since 1991, the same year that Tim Berners-Lee created the first website, which is still running today, but for historical rather any aesthetic or functional reasons.

Although VBA is as far removed from its 1991 forefather as today’s HTML5 sites are from that first website, every version of Office for as long as I can remember has come with a threat that perhaps VBA will not be supported. Office 2008 for Mac did not support VBA, but it was back again by public demand in Office 2011. Office 2016 has just been released, and we can all breathe a sigh of relief to see that VBA is still there. But Microsoft’s focus is shifting to Office 365 from Office for the desktop. According to Satya Nadella of Microsoft, “the most strategic developer surface for us is Office 365.”"

[1] http://www.amazon.com/Going-GAS-Google-Apps-Script/dp/149194...


Fellow (ex-)banker turned founder here. I can relate to this... and it is very scary (the amount of spreadsheet craziness I've seen on trading floors is insane). But I still believe that Excel is a powerful and power-enabling tool. And I know from my own experience how fragmented the "Excel tools industry" is.

We've been working on a fully fledged version control system for spreadsheets for ~2 years now and finally getting somewhere with Pathio[0] (we used to be called SpreadGit but that wasn't ideal for many reasons).

It's taken us a while to get here as our aim was always to build a system that: - works with all Excel file formats (including the binary formats) - works with VBA - integrates seamlessly into (non-technical) people's workflow (which is different to how developers work) - is fast (I personally worked on 100mb spreadsheets when I was at UBS...) - gets the diffing right (which is different in two dimensional space where inserting/deleting rows/columns has a knock of effect on potentially a large number of formulas which are just... noise)

Feel free to drop me a note to see if we can be of any help to you - We also have an on-premise version (thanks to the guys at Replicated[1]).

[0]: http://www.pathio.com [1]: https://www.replicated.com


How about getting rid of VBA entirely and replacing it with a visually designed workflow? EasyMorph[1] does data transformation outside Excel (so it's not a plugin or add-on), but then results can be inserted into a sheet of existing spreadsheet.

[1] http://easymorph.com (I'm the founder).


Part of the issue is that the team I'm on consists of a majority of finance majors who only know VBA, and maybe just a few C++ devs (I'm pretty much the only non-finance person on the team). This is sort of those "when all you have is a hammer, everything looks like a nail" issues. Simple one-off VBA tools find a little bit of traction and then, through patchwork, evolve into something critical yet lack proper up-front and long-term-planning design. They're all smart folks of course, but I feel like so much more could be accomplished with better tooling.

Your product looks really cool; I've actually been looking for something that will do ETL tasks and I've only used Informatica and SSIS, which obv aren't free. Will be checking it out!


This might be changing though. I'm following a CQF program, and while the tools are traditionally excel, VBA and C++ - Yves from Python Quants is there lecturing about using python.


Is EasyMorph similar to OpenRefine?


Yes, it's somewhat similar. EasyMorph is a hybrid tool -- you can do data preparation like with OpenRefine, but at the same time you can re-run the same transformation process for another file or with a different parameter, which makes EasyMorph similar to ETL.


It's interesting to see this written in Java, since the Excel application can be reached as a .NET object[0]. On top of that, Compare-Object is a built-in function for Powershell. Sure, with this built in Java you can use it to diff Excel spreadsheets on CentOS, but how often do you have those there? But in .NET most of the pieces are probably already there, so less work may have been needed to produce the same tool.

[0] http://www.madwithpowershell.com/2013/11/using-excel-functio...


It's for more than just for Excel and Windows.


Microsoft finally added the ability to compare two workbooks in Excel 2013:

https://support.office.com/en-us/article/Basic-tasks-in-Spre...

Not a command line tool, and no API, but they delivered the basic feature. As far as I can tell, it isn't used very much.


Good strategy is to move all data transformation into Excel add-in Power Query and leave raw data in original format. No more copy/ paste of data. "Fat finger" events are minimized. Using PQ gets you moving towards more robust business process, data management and IT architecture.


OOC, how much of the complexity for implementing this was getting the data out of the formats, and how much was implementing the diff once you had the data?


It looks like the Apache POI library gets the data out of the formats. The tool itself seems to compare the data cell-wise (i.e. A1 against A1 - no alignment seems to be attempted) and there doesn't seem to be any handling for merged cells, charts, pivot tables, filters, data validation, etc.


They should add this to Diffoscope <https://diffoscope.org/>.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: