Hacker News new | past | comments | ask | show | jobs | submit login
Regular expression functions in Excel (microsoft365.com)
128 points by thunderbong 10 months ago | hide | past | favorite | 86 comments



It’s interesting to see the give and take between google sheets and excel. Google sheets came on the scene shooting for total backwards compatibility and then proceeded to develop some really interesting innovations. Now we see new features emerging on both sides that are quickly replicated by the other player. Notably off the top of my mind:

* spill formulas - google first, now supported in MSFT

* # notation - MSFT enhancement to spill formulas not yet adopted in google

* regular expressions - google first, now in Excel

* check boxes - google first, now supported in excel

There must be others. I would expect competitive dynamics where each side tries to build extensions that can’t be replicated on the other side


Coincidentally using the same function names as Google sheets did for years. :D


Why would spreadsheet software try to differ in formula names? You want users to be able to switch to your platform with little friction.


It used to be the other way around, with Excel leading the way.


I've been using Sheets's regex functions for over a year, they're really helpful for basic data transformation.

Excel also got a "split" function much later than Sheets did.

I was excited to read on the Sheets blog that Sheets finally will have a table functionality, which Excel has always had: https://workspaceupdates.googleblog.com/2024/05/tables-in-go...


This is one of those fringe features that 1% of users are clamoring for and 99% of users should never ever use. I’m not surprised it was never at the top of the Excel team’s TODO list.


Yeah the problem is that those 1% of users are probably writing the most critical spreadsheets, and when companies using GSuite think about moving to Office365 the more pain it is to migrate spreadsheets the worse for Microsoft.


So why are only 2/3 of the formulas named the same? Excel's new REGEXTEST [1] is REGEXMATCH in Sheets [2]

[1] https://support.microsoft.com/en-us/office/regextest-functio...

[2]https://support.google.com/docs/answer/3098292?hl=en


Good ideas should have a provisional naming period where a standardize name can be agreed on. Anything else leads to fragmentation and lock in due to domain specific terminology.


Or Excel mimicking the broken calendar of Lotus 123.


And it was an excellent decision, see rationale: https://www.joelonsoftware.com/2000/06/03/strategy-letter-ii...

TBH, they should have dumped during switch to OOXML, that was a wasted opportunity. On the other hand, that's hard sell to business.


Well, excel translates the formula names... (One of the many reasons I switched over to using windows in english, rather than german)


huh, TIL that regex wasn't a thing in vanilla excel. Thought that'd be a basic thing that's included.


I realized that long ago... it's incredible that it took this long given how common it is in programming.


The strategy of freezing VBA and BAT scripting, offering better alternatives, and hoping they'd go away on their own hasn't worked.


You can using VBA functions and VBScript.RegExp. This is restricted to Windows though. Microsoft wants to get rid of VBScript, so maybe they are going through the most common use cases in Excel.


Have they officially said they want to get rid of VBScript?



Yes, but that doesn't include getting rid of VBA. They probably do want to do that though, to allow mobile apps and web-based Excel to run interactive sheets using Javascript. Of course they've been trying for years to allow Javascript in Excel, but at least in the first versions that ran using Internet Explorer in the background. I'm not sure if that has evolved further now.


You never expect basic ergonomics from an ALL CAPS language. They all suck.


The reason is MS is gradually deprecating VBScript. Today it's the VBScript.RegExp reference that is being used in VBA.


Could we get some easy aliasing of REGEXREPLACE to reRepl and picking a regex engine that matches the syntax rules you're used to in a the next decade or so?

> Try asking Bing Copilot for regex patterns!

Or maybe embed a cheaper and more reliable solution like https://regex101.com?


You could quite easily store a lambda formula (that just passes on the arguments) as a named function. Very neat trick for organizing and re-using formulas in excel.


Any information on which standard they have implemented (POSIX BRE, ERE, PCRE, ...)? Since they are Microsoft, I would not be surprised if there is none.


> All regular expressions for this function, as well as REGEXEXTRACT and REGEXREPLACE use the PCRE2 'flavor' of regex.


Okay, thanks! Now next question which comes into my mind: is there info about the regex engine they are using? I would expect there is some (proprietary?) C++ library also used in other MS products or are they even using a FOSS licensed one?


Seems likely that if they explicitly say they're supporting the PCRE2 syntax it is because they are using the BSD licensed libpcre.

Reinventing a regular expression system is very far down on the list of things I'd ever want to do. Those things are filled with dragons and require years of refinement to get the bugs out.


According to the documentation links in the article, it's using PCRE2.


At last the production reliability of Excel meets the clarity of regular expressions and the simplicity of matrix slices


Good gravy.

Things im terrified of:

- making an error in a financial model - checking a model - making an error in a regexes - checking a regex

Thins I love; - dumb excel stuff

This is a great time to be alive, and at arms length from finance.

To all the people who will deal with the fallout- I salute you.

This is awesome.


I am terrified of the same things. I got given a validated financial model for something ages ago and found a major Excel fuck up in it. When I say validated, it was run through an expensive validation company who charged a lot of money and they didn't even notice it.


Oh I know that fear. You can’t trust anything anymore.

It’s so easy to just miss something. A date function which is off.

Missing something in excel is a big deal for a firm which is mean to do it - and it will always happen. And that’s the experts.

I’m betting Someone was too tired, and uninterested and just followed the script.

Hopefully that experience put the fear of God into some analyst and associate.

Pretty much the only thing that I can rely on when it comes to modeling.

Knowing that someone is as terrified as I am and has put the work in to not be embarrassed.

And let’s agree - it is work.

I remember seeing a date function used by another firm for the first time.

It too me at least an hour to decipher it.

It was cool, I learnt a lot. But it takes time.

Regexes are cool, and I guess people will be learning a lot.


Almost a decade ago I worked for a bank and in our little corner of the organization I was rapidly becoming the "Excel Guy". While I was happy to help people, the prospect of being pigeonholed into such a role terrified me and I switched jobs. That wasn't the only reason, but it was a major factor.


This is why I pretend to not be good at Excel. Once word gets out, it becomes your job.


Just to be clear, was the POC financial model implemented in Excel?

If so, did it at least have embedded VB or was it all cell logic?


There was VBA and cell logic. The sheet was validated by a third party then someone ported it to a proper language (incorrectly). That model was not validated.


Why do you call it "a major Excel fuck up" when the error arose from the language port?


Did they not compare data input/output of the excel original versus the port?


Nope


never too late to do something that should have been since day one


Finally, an excel formula I will actually remember


truly amazing times: Excel gets regex support (search-in-selected-cells is not available though, as well as search by regex)


LibreOffice has had regex for donkeys years.


I read the title, and was actually surprised how in 2024 this is a headline. I'm surprised that this has not been a feature since forever as well.


Apple’s Numbers spreadsheet has had regex support for a while as well.


which is probably the one good thing you can say about Apple's Numbers


I can say the UX of Apple Numbers is the best, much above the usability I've found of Excel. Much less powerful, but it is much easier to use Apple Numbers


easier to use for power users or for beginners?


As someone who regularly flits between Google Sheets, LibreOffice Calc, and Numbers regularly (with my most time in Google Sheets and Numbers), and as someone who's made some extremely complicated spreadsheets in all three, I have to say I vastly prefer Numbers if I can get away with it. It has a lot of issues that can make some stuff hard fast (no array formulas is a big one), but I find it significantly easier and faster to prototype in Numbers regardless.

I think the main bit I love so much about it is having actual tables instead of the Infinite Grid that most spreadsheet software uses. You get named ranges for free, and it makes semantical sense too, among a good number of other benefits (sheet organization, refactoring, simpler styling...).

There are some really nice things that Google Sheets does, and I've done a few fancy things with App Script which isn't too bad, and I do really like QUERY though I wish it was a bit higher power. I just always find myself missing the UX of Numbers, though.


Hope the office copilot can do this for me


Does excel have gpu accelerated calculations, seems an obvious thing to add if not?


It may appear obvious, but only a very limited subset of real-life spreadsheets could meaningfully take advantage of it. Excel does not support GPU-accelerated calculations.


doesnt the back-end of excel have a columnar data model that can store millions of records in a single excel file?


The front-end of excel doesn't allow the creation of more than 1048576 rows in a sheet. (1<<20)


What's the joke for this - now you have three problems?


Now you’ve got two problems


Yet another meta breaker - this is another move aimed at the casual audience to bring in the party gamers and make the program an absolute mess competitively.


Solid reference!


Use Libre Office if you value your freedom.


I love freedom and I love open source, but the problem is that Libre Office is too far from MS Office in terms of UX and functionality.

If your usecases are not complicated, Libre office could work well. Otherwise your efficiency will be behind those that you can achieve with MS Office.


Oh man Libre Office is trash and so are the leadership/maintainers. No tears.

I once got banned for asking to put 'text size' on the main screen for the powerpoint knockoff.

Text size seems pretty important to have. You shouldn't have to google how to change the text size.

I'm deeply convinced there is a Microsoft plant undermining everything.


Currently at least text properties and size is on the main screen in the current defaults for Impress and, just like PowerPoint, only made visible and editable when you click something which has a text size. Maybe it was different in the timeframe you were talking about (barring any actual information on the request) but this combined with how abrasive your comment is leaves little reasoning for why it's horrible and casts serious doubts the ban was actually related to asking about interface enhancements even if that's when you were banned.

FWIW I prefer PowerPoint over Impress as well (particularly when it comes to the browser side of things on the go), using one or the other is just not a vendetta of mine.


My comment can make Impress better.

Yours continues to enable the status quo.

"oh he made a great point, but he said it snarkey! Guess M$ can just keep being the only company with text size"


> I'm deeply convinced there is a Microsoft plant undermining everything.

It's conspiracy theory thinking, but I'm getting there, too. These things have been too close to being complete replacements for commercial products for so long, but somehow still fall over on problems that have been complained about for a decade or more.

There's not one of them that some massive corporation with sights set on adding a letter to faang couldn't pick up and turn into a legitimate competitor in a month or six, khtml-style. Instead they often sit on moribund subpages of some larger project website, with a blog updated once every year or two.

These projects have to be targets for sabotage by their commercial competitors, just as government initiatives are, just cheaper. For e.g. Adobe it's less than a rounding error to e.g. spend 500K/year supplying a developer to e.g. Scribus[0] to make sure that it remains difficult to contribute to, or makes bad architecture choices, and that hypothetical developer could be one of its biggest actual contributors.

Maybe it's just because they have to spend a lot of time chasing Gtk, which makes it another redhat problem?

edit: The 95% state of all a lot of these FOSS packages is also evidence that there are zero tech billionaire philanthropists. It would take a total of one of them to grab all of these projects and wrangle them into good form.

[0] This is an actual hypothetical, I'm not making an accusation about Scribus. Between the GIMP and Inkscape, they literally are the only people who made a real effort with color for years. Inkscape openly said that their software was only for making images for the web (as opposed to print), as if there were a rational reason to cripple their product and narrow people's interest in it. Now that deviantart is gone, will anyone care about Inkscape anymore? Will the dopamine hit hobbyists get from sharing generative art cause Inkscape to be totally left behind? Why is it hard to design a form for your office's paperwork in Inkscape, a vector drawing program, if forms are just straight horizontal lines and text? Why aren't they trying to merge with Scribus (and LibreOffice Impress/Draw) and create a complete pdf solution? I have no idea.

I'm being very ranty here, and I do want to say that I very much appreciate the work that people are doing for free, in their spare time, for others.


No lookahead/lookbehind? Boo


Now I can finally parse (X)HTML with Excel ;)


Now I need to build a regex for zalgoifying text in Excel.


Congrats on catching up to Perl! Maybe it took so long because Bill was on vacation on Epstein island.


Perl's UI still has some work to do if we're making incomprehensible comparisons.


I am waiting for the day I can use Excel as a REST API client, so I write the ID of some record in a cell, and Excel does all the work of calling the API repeatedly and fills the rows and sheets with the required information.

This will kill a few Python jobs and make it a very popular REST client =)


Can you do this with the =WEBSERVICE() function? https://support.microsoft.com/en-us/office/webservice-functi...


Can I parse response (JSON, XML, whatever) into cells without VBA?


With some effort, probably: =FILTERXML() https://support.microsoft.com/en-us/office/filterxml-functio...

Could also do some crude/fragile parsing with aforementioned regex functions.


I've actually done that. I needed to create a lot of json documents and POST them somewhere (it was metadata for some Chef cookbooks), and a lot of the information was similar with a few variations. Excel lets me create what I call a "conductor's score" of that information. (In music, the conductor's score has all the instruments on one page. Each row is a different instrument, and each column is the same measure [point in time] in the music.) In other words, I can easily look for commonalities, differences, and presence of the data as I update it.

I was just going to copy/paste the fields into JSON which is repetitive and error prone, and realized I could automate it. So I wrote VBA to output a json version of that. (The annoying part of writing that in VBA is that JSON data needed a lot of double-quotes, but then you have to use escape sequences for every double-quote.)

Once I had the JSON data generated, I was going to post each one manually, so again, I looked and realized you can POST the data directly from VBA. Added a button to the page and you could update the data in Excel and click a button to POST it.

Of course, once I turned it over to someone else, they were like "what the hell" and started doing it a different way. lol



I've done this in Google Sheets, with "Google Apps Script", the most horridly named thing I think I've ever worked with.

It's just (utterly ancient) Javascript (I think ECMAScript, technically, since there's no DOM, no browser, etc.), bolted on. I wrote in modern JS and used Babel to transpile the source back into the stone age.

… I'd much rather write Python.


Until they add python into excel, then there will be python jobs again :P

https://support.microsoft.com/en-gb/office/get-started-with-...


Every self-respecting corporate org will block execution in the cloud + no IDE (you edit the code inside the cells), its dead on arrival


How about Excel as an API server, too?


You've been able to do that for ages, a client automating Excel through COM and listening for events in Excel from script: https://news.ycombinator.com/item?id=37229850

Microsoft's advice page for wrapping Office in a web application fronted by ASP/ASP.Net: https://support.microsoft.com/en-us/topic/considerations-for...

And that has links to documentation about Excel Web Services of old (on-premises SharePoint days, not sure if it translates to today's cloud), which is SOAP/HTTP: https://learn.microsoft.com/en-us/sharepoint/dev/general-dev...


The "new hotness" for today's cloud are all the Excel APIs in the Microsoft Graph:

https://learn.microsoft.com/en-us/graph/api/resources/excel?...


MS has had ODBC integration for ages. That is a more rational way to connect Excel with data sources.



Kudos using "regex" instead of "regexp", as it is much nicer to say out loud.


Am I the only one who distrusts domains with numbers in it?


I am confused between all these AI announcements from Microsoft and the exciting new regex support in "preview" (aka soon to be dropped)




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: