Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

And for the next ten years, flawed attempts at imitating this will show up in production code all around the world...

I'm not saying he shouldn't have (I've certainly learned something I didn't know), but let's face it, posting this on StackOverflow is like handing a loaded gun to a bunch of children and telling them not to pull the trigger.



  but let's face it, posting this on StackOverflow is like 
  handing a loaded gun to a bunch of children and telling 
  them not to pull the trigger.
No, let's not face it. How is sharing well thought out, designed solutions to problems ever a bad thing. Sure there are a handful of junior coders that now feel overly confident and will mess this up, but they'll learn.

And then there is an equal pool of good developers that are now even better thanks to the info-share.

I acknowledge I'm being pedantic, but the Let's all nod and pretend we are way smarter than THAT group over there mindset gives me brain-diarrhea... you see it in every community (reddit, slashdot, HN, digg, etc.) and I've never seen it help anybody accomplish anything, anywhere... ever.

</takes off his internet-police hat>


No, using regexes for this is terrible. I could write you a program that will parse html employing GOTOs, but it's been recognized as a bad programming practice. I'm not going to do your research for you, but this is not opinion, it's the result of people studying large code bases and defect rates within them. Same goes for overly long functions, bad variable naming, poor commenting, and so on.


A correctly written regex is not a bad programming practice. The insinuation that most programmers can't write or debug a regex correctly is disingeneous.


First, correctly written code can be bad practice. Regexes are a powerful tool, and have appropriate uses. I disagree this is one of those cases, but at -4 on my previous comment, I guess most don't agree. Second, I would bet the majority of programmers are mediocre with regular expresssions at best, and even worse at reading regexes written by other programmers contributing to code maintenance issues.

Finally, I may have been "incorrect" but "disingenuous" is an insult. I'll be charitable and assume you're using the word wrong.


Perhaps it's a difference of experience, but I really haven't met a professional programmer who doesn't understand regexes but still insists on using them for non-trivial tasks. Thus my usage of "disingenuous", because I have trouble believing that such people exist, and I felt you were trying to make a point insincerely, perhaps out of confirmation bias. I apologize if it came across as an insult -- it wasn't intended as one.


No harm, no foul. They really do exist, unfortunately.


Using "regexex" for doing what is terrible? Using regexes for a lexer is quite common, and if you read his code, that's what he is doing, and then feeding the tokens to the parser.


I'm not saying he shouldn't have (I've certainly learned something I didn't know), but let's face it, posting this on StackOverflow is like handing a loaded gun to a bunch of children and telling them not to pull the trigger.

True, but that's the case for posting anything non-trivial where Google can find it. And isomorphic to using non-trivial programming languages or patterns in industrial programs.

At a certain point, you have to let go of the idea that if you hide anything requiring thought and judgment from people, they won't hurt themselves.

Some people are going to foul things up no matter what you do or don't tell them, and some will excel even if you make it hard for them to find knowledge. The interesting group are those who are open to being influenced by what you have to share.

I suggest that even if only a small proportion of SO readers walk away enlightened, this article is a win. The folks who use it to justify their own failed attempts to manage arbitrary HTML with regexen would have pooched the task in some other way even if they stuck to using a parser library.


I disagree. He went through a lot of effort to lay down the foundations of parsing html with regexps. If anything, he's demonstrated exactly how difficult doing this is and how finicky the solution is. In short, he's stripped the sexy out of using regexps to parse html: we now know it can be done, but it's a pain in the ass, and specially written parsers are faster and easier to use. So, there should be no illusion that regexps are a shortcut in this situation.

At least that's my takeaway.


  > You can write a novel like tchrist did
  > [...] – meder Nov 20 '10 at 19:36

  > That was kinda my point, actually. I
  > wanted to show how hard it is. – tchrist
  > Nov 20 '10 at 19:38
Seems that you are correctly channeling the author.


He keeps saying that, but he also keeps railing on anyone who suggests that it has anything to do with HTML not being a regular language.


How many problems would novice have if he decides to parse HTML with regexes in Perl?


Which one is this: a serious expression of concern or some kind of a lightbulb joke?


Pretty sure it's a reference to this famous[1] joke:

>> Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have two problems.

[1] http://regex.info/blog/2006-09-15/247


The correct response to that saying is to point out that regular expressions are significantly simpler than general-purpose, Turing-complete programming languages.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: