Hacker News new | past | comments | ask | show | jobs | submit login

I really don't get the hate for regexps. They aren't very hard to learn and they are hugely productive. I use regexp in my text editor search constantly. It bothers me that I don't have regexp enabled search on everything. Web browser, I'm looking at you!

And regexp-based search-and-replace! Swoon! Regexp has made me significantly more productive and less error prone. I haven't made copy-pasta data errors in quite a long time because of it.

Yes, particularly hairy regexp looks like line noise. What poorly written code doesn't? I think people get caught up in the idea that they have to try to write one regexp to rule them all and perpetuate until the end of time. Sure, yeah, if you don't understand how to write regexp and have to look up a cheatsheet all the time, I can see how you would want to avoid touching it ever again. But that applies to SQL or CSS or whatever braindead config file format we're using on our project today, or any other language that isn't your wheelhouse.

Seriously, learn regexp already.




The hate is when they are used for code where a parser would be better. (e.g. parse an e-mail address).

I don't think anybody has an issue with regex search in an editor, but we are more talking about regex as a core language feature.

Even in an IDE there are times that regex is the wrong solution. e.g. regex based refactoring can introduce bugs in most languages where a parser based refactoring wouldn't.


I think it's a far more offensive grievance to write something like an e-mail address spec that isn't able to be parsed by regular expressions.

A problem that I'm sure is compounded by some people's lack of experience using them.


Grammars are invented in Perl6 just for that. Perl5 has module[0] for that implemented using recursive regex's.

Example of LaTeX parser:

    use Regexp::Grammars;
    $parser = qr{
        <File>
        <rule: File>       <[Element]>*
        <rule: Element>    <Command> | <Literal>
        <rule: Command>    \\  <Literal> <Options>? <Args>?
        <rule: Options>    \[  <[Option]>+ % (,)  \]
        <rule: Args>       \{  <[Element]>*  \}
        <rule: Option>     [^][\$&%#_{}~^\s,]+
        <rule: Literal>    [^][\$&%#_{}~^\s]+
    }xms
[0] http://search.cpan.org/~dconway/Regexp-Grammars-1.041/lib/Re...


A regex is really just a highly compact and powerful parser domain specific language and engine. People often don't understand the DSL adequately and it's use can become problematic, but properly applied it can dramatically ease certain tasks.

That said, all you really need to know is that it boils down to the exact same underlying issues that cause people to like/dislike any language over another. It's exposure and preference. People prefer verbosity/conciseness/terseness, structure/versatility, speed/ease of coding, and the things other than what they prefer are obviously wrong.


Sounds like throwing the baby out with the bathwater.


Can you please let me know what the "baby" is in this case? Nobody is seriously against regexes for all cases (all developer editors have a regex search, for example).

However, they aren't even always the clearest way to parse regular languages (depending on the language, automata can be more clear than regexes or vice versa).

Perl regexes can parse many context free languages, but again it's not always the clearest way to represent it.

Despite the original quote, I use awk as a standard part of my toolbox. I'm not going to write a web-browser in it though. Nearly every time someone says "now you have 2 problems" or "considered harmful" it's a reaction to overuse or misuse, not a literal call to abandon it altogether.


> not a literal call to abandon it altogether.

No, that's literally every case I've seen for the use of the "now you have two problems" phrase. It's like someone has a bot on the lookout, alerting them to mentions of regexp so they can run in and say "now you have two problems!"

I think it's another symptom of the common trend of modern programmers not wanting to do any programming anymore. Breath mention that you might want to implement your own text editor control because you haven't been able to find one that suits your needs and immediately get jumped no with "don't reinvent the wheel!" Spend a small amount of your spare time toying around with toy programming languages and it's, "you have too much time on your hands!"--as if being massively overworked on business interests is some kind of virtue. Make a puzzle game without using Unity or some other overblown framework on your small project and it's "you're nuts!"

No, literally, I got called insane once for not using Angular on my personal website that mostly just amounts to a list of links to my social media profiles.


Regexps are like farts. Everyone likes their own but is disgusted by everyone else's.


Please, let's try to keep the jokester replies to Reddit. I'm trying to understand why people who are ignorant of what regexp can do for them are so dead-set against learning regexp.


> Please, let's try to keep the jokester replies to Reddit.

You realize the linked to article is entirely about a joke about regular expressions. I can't imagine a thread where a joke about regular expressions would be more appropriate.


Nobody is against learning regexp. (Except lazy people, perhaps) But a lot of people are against using regexp as a default solution. Here's a pretty common case: see if the string you have contains a delimiter.

First you think, okay, I can use a function like strchr() or index(). It'll immediately return the location of the delimiter. Can't get much simpler or efficient than that!

  $loc = index($_,$delimiter)
But wait. What if my string has quotes or spaces before the beginning of the string or delimiter? I don't want any of that crap. Now I need to write a bunch more parsing code - or, I can use a regex!

  $delimiter = "=";
  $_ = q| " my key = something in the string " |;
  /^\s*"?\s*(.+?)\s*($delimiter)/;
  $keyloc = $-[1];
  $delimloc = $-[2];
  print $keyloc; print $delimloc;
  3
  10
Ok, looks good. Let's try a couple different delimiters.

  $delimiter = "_";
  $_ = q| " my key = something in the string " |;
  /^\s*"?\s*(.+?)\s*($delimiter)/;
  $keyloc = $-[1];
  $delimloc = $-[2];
  print $keyloc; print $delimloc;
  
  
Hmm... no output at all. Weird. Oh! index() will normally return -1 on failure, but $-[] doesn't get set if the match fails. We forgot to change the delimiter. Ok, try again:

  $delimiter = "_";
  $_ = q| " my key _ something in the string " |;
  /^\s*"?\s*(.+?)\s*($delimiter)/;
  $keyloc = $-[1];
  $delimloc = $-[2];
  print $keyloc; print $delimloc;
  3
  10
Ah, that's better. Let's try another delimiter.

  $delimiter = "+";
  $_ = q| " my key + something in the string " |;
  /^\s*"?\s*(.+?)\s*($delimiter)/;
  $keyloc = $-[1];
  $delimloc = $-[2];
  print $keyloc; print $delimloc;
  Quantifier follows nothing in regex; marked by <-- HERE in m/^\s*"?\s*(.+?)\s*(+ <-- HERE )/ at -e line 1.
Holy shit, a fatal error? Hmm, it was just a delimiter change.... welp, looks like the regex parser thinks '+' is mart of the match. Need to escape it so it's not interpolated:

  $delimiter = "+";
  $_ = q| " my key + something in the string " |;
  /^\s*"?\s*(.+?)\s*(\Q$delimiter\E)/;
  $keyloc = $-[1];
  $delimloc = $-[2];
  print $keyloc; print $delimloc;
  3
  10
Ok, it's working again. But what if there was no key entered at all - just the delimiter and the rest of the string, like if a filesystem path was entered?

  $delimiter = "/";
  $_ = q|/path_to_a_file.txt|;
  /^\s*"?\s*(.+?)\s*(\Q$delimiter\E)/;
  $keyloc = $-[1];
  $delimloc = $-[2];
  print $keyloc; print $delimloc;
  
  
Crap. The delimiter is there, but my regex is broken again, because it expected a (.+?) before the delimiter. Time to fix it again:

  $delimiter = "/";
  $_ = q|path_to_a_file.txt|;
  /^\s*"?\s*(.*?)\s*(\Q$delimiter\E)/;
  $keyloc = $-[1];
  $delimloc = $-[2];
  print $keyloc; print $delimloc;
  0
  0
There! Whew. That didn't take too long. Let's just hope nothing else unexpected happens, huh?

index() and rindex() would not have had all these issues - they would have returned a location if the delimiter existed at all, or -1 if it didn't, and wouldn't run into interpolation issues, etc. All of these bugs (AND MORE!) can be solved by just writing a parser, or using a couple index() and rindex() calls, or restricting the format of the string to more rigid rules. But by using regex's, we've doomed ourselves to more unexpected issues in the future.


This is a ludicrously contrived strawman.


Not as ludicrously contrived as an html parser using regexps.


Dude, code however you feel like. I'm not getting into a troll fest about why it's stupid to use regex's for everything.


I have never seen anyone argue for using regexp for "everything", but I see on a very regular basis people arguing they should apparently never be used for anything. Even the simplest of questions on StackOverflow get answered with the ever condescending, "what are you trying to do?" Followed by "now you have two problems." Followed by "use a parser." Followed by silence on how that specifically applies.

I called it a ludicrously contrived strawman because your proposed remedy to "just write a parser" is not any simpler of a task than the one you mocked up for regexp. There are still plenty of bugs you get to write and miss for several hours when you write any nontrivial software.


1. Nobody on HN is saying to never use a regexp 2. It's not a ludicrous example 3. A parser is not a ludicrous way to solve the above problem 4. It's not a straw man because it's not an irrelevant argument set up to be defeated, it is specifically an example of how EITHER using some simple functions OR a parser would be less problematic in practice than the gradual bit rot of erroneous use of the extremely powerful and unnecessary regular expression 5. The software becomes nontrivial when you complicate it with regular expressions 6. Where are these examples of people telling you never to use regular expressions 7. How is it you run into this on a very regular basis 8. If it's a simple question it probably has a simple answer and regular expressions are not simple as my example has shown 9. There's a reason this phrase is a truism and it doesn't need a mathematical proof to be accepted as a truism 10. Code however you want dude, it doesn't matter what a bunch of people on StackOverflow or any other website say except that 11. If a lot of people keep saying the same thing, there might, just might, be some merit to it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: