Hacker News new | past | comments | ask | show | jobs | submit login
Source of the famous “Now you have two problems” quote (2006) (regex.info)
159 points by tambourine_man on Aug 19, 2015 | hide | past | favorite | 60 comments



It really bothers me that Google has allowed groups to deteriorate so much. At one time, it was actually a decent archive of Usenet history, but since then, search has deteriorated, and apparently, posts are sometimes missing.

Perhaps the people who are now famous that posted on Usenet back then didn't want posts from their college days in the spotlight, and their friends at Google implicitly let the site go to hell. </conspiracy-theory>

Edit: Hmm, I may be wrong. A while back, many of the links in this list of "memorable Usenet moments" [1] were broken, but they seem to work now.

Edit #2: Okay, I was not wrong. For example, take a look at the link on that page to "December 1982: First thread about AIDS" [2]. The link takes you to a Usenet post that doesn't even mention AIDS, in the newsgroup fa.telecom.

[1] https://support.google.com/groups/answer/6003482?hl=en

[2] https://groups.google.com/forum/#!msg/fa.telecom/EmQ-s_EGgSA...


Some Google usenet fun. Go to the main Google Groups page [1].

Use the search functionality to search for "tim smith csh callan". You get one result, which is a 2007 post from comp.os.linux.advocacy where someone is quoting a 1984 post of mine that was in net.unix-wizards. Note that my 1984 post is not found.

Now go to the Google Groups version of net.unix-wizards [2].

Search there for "tim smith csh callan". Now the above mentioned 1984 post is found, along with another 1984 post.

Lest you think that there is some problem when searching from the main page, click on the "Search all groups" link on the net.unix-wizards search results page, and it only finds the 2007 COLA post that quoted my 1984 post.

A search from the main Google search page, as opposed to the search within groups, finds the first 1984 post as the first result.

I've seen vast numbers of posts become unfindable by search, and then weeks or months later become findable again. For instance, there was a long time when if you searched for "Bill Gates" in Google's usenet archive, it would only return something like a dozen posts.

To put it bluntly, Google's handling of the usenet archives has been negligent and/or incompetent.

[1] https://groups.google.com/forum/#!overview

[2] https://groups.google.com/forum/#!forum/net.unix-wizards


I've seen vast numbers of posts become unfindable by search, and then weeks or months later become findable again.

I think this is an effect of the way Google searches/indices things; I am equally frustrated by pages that disappear from Google's web search which may or may not come back eventually (although I've seen more disappear than come back...) Remember that they're running a huge distributed system, and so consistency/completeness is probably relaxed in order to optimise other things they believe are more important. It's the same reason why even if Google says there are X results in a search, you often cannot view them all.

(Not that I'm actually agreeing with this behaviour, however. It's less noticed on the web where there tends to be a lot of redundant/similar information, but still not desirable at all.)


It was once possible access Google's archive of Usenet without Javascript. And there were "heavy" and "light" versions of the messages. The heavy versions have an enormous amount of Javascript, CSS and HTML cruft.

For example, http://groups.google.com/group/comp.unix.wizards/msg/24222e5...

However, later they switched to HTTPS and #! URLs. Around this time I remember getting $CLASSPATH errors. Perhaps this is evidence to support your incompetence argument?

The "content" here is nothing but some plain ASCII Usenet posts. How difficult is it to serve plain text?

Anyway, today the same URL has been converted to this:

https://groups.google.com/forum/#!msg/comp.unix.wizards/bllj...

As I said in an earlier thread, Google itself developed a proposal to deal with this #! URL problem and advises webmasters to revise these AJAX URL's to "escaped_fragment" style URL's:

http://developers.google.com/webmasters/ajax-crawling/docs/s...

But apparently when the webmaster is Google, the specification does not apply.

Years ago, I made my own archives of some important comp and net groups. Google is not reliable. This stuff should be placed with the Internet Archive.


> Edit #2: Okay, I was not wrong. For example, take a look at the link on that page to "December 1982: First thread about AIDS" [2]. The link takes you to a Usenet post that doesn't even mention AIDS, in the newsgroup fa.telecom.

I think that help article is just messed up. That section of the list seems to be doing something weird with the permalinks. The full links still work....just see e.g. the link in this 2002 metafilter post about the AIDS post. Still works: http://www.metafilter.com/22004/First-mention-of-AIDS-on-Use...

edit: though it's worth noting every other link in that list I've tried (even the ones with the weird permalinks) has worked correctly so far


That thread seems largely incomplete though. It contains 4 posts on Google Groups, but notice how the comments on MetaFilter seem to refer to more.


FWIW, there were issues with missing content even in the days of Dejanews.


True, especially in the really old archives... but there's a big different between missing content from the archive and being unable to find content you know is in the archive because of an apparently broken UI.


There was a company called AIDS that eventually changed its domain name from AIDS.COM...


The problem with usenet is that its usenet. It will never change. The poor experience, the slow updating, the almost non-existant moderation, the impossible spam filtering, endless abuse, nothing to stop stuff like alt.tasteless invading alt.cats - again, the difficulty of searching it well, etc.

I think a lot of people were seeing what web forums were doing (look at Slashdot or Metafilter from that era) and decided that supporting usenet was betting on the wrong horse. I don't really blame them. What we can do with completely controlled systems on our own software and on our own servers vastly surpasses what usenet was capable of.


The complaint here isn't that Google is failing to fix today's Usenet, but that their historical archive of Usenet content going back to the early '80s, which they acquired from DejaNews, has a pretty broken interface (and is worse than it was before they bought it).


I was confronted with a problem, so I thought I would use Java. Now I have a problem factory.


When all you have is a HammerFactoryImpl, every problem looks like an INailIter


Why restrict things to Hammer and Nails - you need to abstract that a bit further.

Why not IDirectedForceApplier and IDirectedForceRecipientIter :-)

Maybe they should introduce a "Name not sufficently abstract" syntax error into the Java compiler! :-)


Of course the currently trending answer would be to use monads. Preferably implemented in Go.


I remember a colleague who wrote Common Lisp in the style of Occam - with concurrent process and communication channels. His code actually looked a lot like some Go code I've seen - hardly surprising I guess given the common ancestry to CSP.


Btw., CSP in Hoare's Book is a Lisp...

http://www.usingcsp.com/cspbook.pdf

Page vi:

> The proposed implementations are unusual in that they use a very simple purely functional subset of the well-known programming language LISP. This will afford additional excitement to those who have access to a LISP implementation on which to exercise and demonstrate their designs.


Hum... Haskell monads are very concrete when compared to the Java factories. They represent abstract stuff, but always a single well defined abstract concept. They do not make the code more generic.

Maybe you can replace those with lenses, or applicatives in general. People do overuse those.


java.classes.factories.problem.module


You made me laugh out loud. Hahaha. Genius. Don't get me wrong, I love Java, but some of its patterns make me wish I was programming C++.


TLDR: Everyone always attributes it to JWZ, but it was actually coined by David Tilbrook.

It's not mentioned in the blog post, but David told me in 2007 that he said it at a EurOpen conference in Dublin circa 1985.

David also told me this is only his second favorite quote; the one he was most proud of, from 1981, was: "Software is the only business in which adding extra lanes to the Golden Gate bridge would be called maintenance."


Keep in mind at the time people were using regexs for tree structures like the DOM, and calculating time.

Manipulating the DOM with regexs generally known to be impossible these days, and using regexs for time only happens on the Unix command line where there aren't better alternatives.


I assure you that people are still manipulating DOM with regexes.


Time for the classic bobince Stack Overflow post:

http://stackoverflow.com/questions/1732348/regex-match-open-...


I know this is a famous meme, but why, in response, does no one mention that sizzle, the selector engine in jQuery is based on regexes? (https://github.com/jquery/sizzle/blob/master/src/sizzle.js) It suggests to me a very large proportion of DOM parsing code in existence uses regexes.


That's one of my favorite pages in all the internet.


Some people, when confronted with a problem, think “I know, I'll use the Banach–Tarski Theorem.”


Now they have their original problem (assumed to contain a sphere), an additional sphere, and a dependency upon the axiom of choice.


True story, did you know you can have paradoxical sets without choice? You just need infinity.

https://en.wikipedia.org/wiki/Paradoxical_set

http://www.math.hmc.edu/~su/papers.dir/banachtarski.pdf


While I don't dispute your claim, neither of the links substantiates it. The Wikipedia article says that you only need the axiom of infinity, but mentions only Banach–Tarski (which does require choice); and the minor thesis says on p. 2 that "the philosophy adopted in ths paper will be the unquestioned acceptance of Choice as a useful foundation in our work".

Do you have any other references for this? (I'm interested, not snarking.)


The paper is kind of long. Look at Theorem 4 in it.

https://www.math.hmc.edu/funfacts/ffiles/30001.1-2-8.shtml


It's as versatile a joke as the old Slashdot Meme (apparently originally from South Park, which I did not know):

1. Do some stupid thing

2. ????

3. PROFIT!

In fact:

1. Use regexp

2. ????

3. PROFIT!

http://knowyourmeme.com/memes/profit

(While we're at it:

I had a problem and decided to use regexp. Now I have two first world problems.)


The South Park underwear gnomes joke was a parody of dot-com startups (i.e. around 1999/2000) who didn't have a clear business model.

https://www.youtube.com/watch?v=tO5sxLapAts


Interesting that the 1999/2000 dot-com crowd were aiming to turn a profit eventually, even if they didn't know how. Seems like even that isn't a requirement in the current bubble... ;)


According to the linked article the episode was from 1998, which actually makes it fairly early in the game even for poking fun at stupid dotcom sites.


The modern version:

   1) Acquire millions of users
   2) Get acquired for billions of dollars
   3) Not my problem


I enjoyed some of the remarks about Postscript in the comments, particularly this one http://regex.info/blog/2006-09-15/247#comment-18269 with its link to "a C-like syntax to PS compiler... called PdB" (http://compilers.iecc.com/comparch/article/93-01-152).

Some of the binaries for PdB appear to be still available, but it looks as if the source was never published? I like the idea of an alternate world in which Postscript, rather than Javascript, became the new universal "assembly language" for compilers to target. I imagine that may have been what Sun were aiming for with their Network Extensible Window System (https://en.wikipedia.org/wiki/NeWS). My only encounter with that was via Xnews, about which I remember little except that it was very slow.


Arthur van Hoff wrote PdB, and we used it for to develop HyperLook (nee HyperNeWS nee GoodNeWS). You could actually subclass PostScript classes in C, and vice-verse!

I still have source code of PdB, although you'll have to ask Arthur for permission if you want a copy.

I wrote a PostScript "pretty plotter" in PostScript -- actually an interactive visual programming environment and debugger for NeWS:

http://www.donhopkins.com/drupal/node/97

Check out the NFS version 3 spec (NeFS) -- sorry but the pages are ordered backwards for some reason (well it is based on PostScript):

http://www.donhopkins.com/home/nfs3_0.pdf

The Network Extensible File System protocol(NeFS) provides transparent remote access to shared file systems over networks. The NeFS protocol is designed to be machine, operating system, network architecture, and transport protocol independent. This document is the draft specification for the protocol. It will remain in draft form during a period of public review. Italicized comments in the document are intended to present the rationale behind elements of the design and to raise questions where there are doubts. Comments and suggestions on this draft specification are most welcome.

The Network File System The Network File System (NFS™* ) has become a de facto standard distributed file system. Since it was first made generally available in 1985 it has been licensed by more than 120 companies. If the NFS protocol has been so successful why does there need to be NeFS ? Because the NFS protocol has deficiencies and limitations that become more apparent and troublesome as it grows older.

1. Size limitations. The NFS version 2 protocol limits filehandles to 32 bytes, file sizes to the magnitude of a signed 32 bit integer, timestamp accuracy to 1 second. These and other limits need to be extended to cope with current and future demands.

2. Non-idempotent procedures. A significant number of the NFS procedures are not idempotent. In certain circumstances these procedures can fail unexpectedly if retried by the client. It is not always clear how the client should recover from such a failure.

3. Unix®† bias. The NFS protocol was designed and first implemented in a Unix environment. This bias is reflected in the protocol: there is no support for record-oriented files, file versions or non-Unix file attributes. This bias must be removed if NFS is to be truly machine and operating system independent.

4. No access procedure. Numerous security problems and program anomalies are attributable to the fact that clients have no facility to ask a server whether they have permission to carry out certain operations.

5. No facility to support atomic filesystem operations. For instance the POSIX O_EXCL flag makes a requirement for exclusive file creation. This cannot be guaranteed to work via the NFS protocol without the support of an auxiliary locking service. Similarly there is no way for a client to guarantee that data written to a file is appended to the current end of the file.

6. Performance. The NFS version 2 protocol provides a fixed set of operations between client and server. While a degree of client caching can significantly reduce the amount of client-server interaction, a level of interaction is required just to maintain cache consistency and there yet remain many examples of high client-server interaction that cannot be reduced by caching. The problem becomes more acute when a client’s set of filesystem operations does not map cleanly into the set of NFS procedures.

1.2 The Network Extensible File System

NeFS addresses the problems just described. Although a draft specification for a revised version of the NFS protocol has addressed many of the deficiencies of NFS version 2, it has not made non-Unix implementations easier, not does it provide opportunities for performance improvements. Indeed, the extra complexity introduced by modifications to the NFS protocol makes all implementations more difficult. A revised NFS protocol does not appear to be an attractive alternative to the existing protocol.

Although it has features in common with NFS, NeFS is a radical departure from NFS. The NFS protocol is built according to a Remote Procedure Call model (RPC) where filesystem operations are mapped across the network as remote procedure calls. The NeFS protocol abandons this model in favor of an interpretive model in which the filesystem operations become operators in an interpreted language. Clients send their requests to the server as programs to be interpreted. Execution of the request by the server’s interpreter results in the filesystem operations being invoked and results returned to the client. Using the interpretive model, filesystem operations can be defined more simply. Clients can build arbitrarily complex requests from these simple operations.


PdB = "Pure dead Brilliant" = "awesome"

Quite a remarkable bit of code that - I loved HyperNeWS.


My favorite variation:

Some people, when confronted with a problem, think "I know, I'll use multithreading". Nothhw tpe yawrve o oblems.


So, I guess I'm a monster raving loon[1]. Oh, well, I like Forth too, but always found Postscript to be easier to read and program.

1) http://regex.info/blog/2006-09-15/247#comment-3085


My first experience with this joke was:

https://xkcd.com/1171/

>If you're having perl problems I feel bad for you son, I had 99 problems then I used regular expressions and now I have 100


I really don't get the hate for regexps. They aren't very hard to learn and they are hugely productive. I use regexp in my text editor search constantly. It bothers me that I don't have regexp enabled search on everything. Web browser, I'm looking at you!

And regexp-based search-and-replace! Swoon! Regexp has made me significantly more productive and less error prone. I haven't made copy-pasta data errors in quite a long time because of it.

Yes, particularly hairy regexp looks like line noise. What poorly written code doesn't? I think people get caught up in the idea that they have to try to write one regexp to rule them all and perpetuate until the end of time. Sure, yeah, if you don't understand how to write regexp and have to look up a cheatsheet all the time, I can see how you would want to avoid touching it ever again. But that applies to SQL or CSS or whatever braindead config file format we're using on our project today, or any other language that isn't your wheelhouse.

Seriously, learn regexp already.


The hate is when they are used for code where a parser would be better. (e.g. parse an e-mail address).

I don't think anybody has an issue with regex search in an editor, but we are more talking about regex as a core language feature.

Even in an IDE there are times that regex is the wrong solution. e.g. regex based refactoring can introduce bugs in most languages where a parser based refactoring wouldn't.


I think it's a far more offensive grievance to write something like an e-mail address spec that isn't able to be parsed by regular expressions.

A problem that I'm sure is compounded by some people's lack of experience using them.


Grammars are invented in Perl6 just for that. Perl5 has module[0] for that implemented using recursive regex's.

Example of LaTeX parser:

    use Regexp::Grammars;
    $parser = qr{
        <File>
        <rule: File>       <[Element]>*
        <rule: Element>    <Command> | <Literal>
        <rule: Command>    \\  <Literal> <Options>? <Args>?
        <rule: Options>    \[  <[Option]>+ % (,)  \]
        <rule: Args>       \{  <[Element]>*  \}
        <rule: Option>     [^][\$&%#_{}~^\s,]+
        <rule: Literal>    [^][\$&%#_{}~^\s]+
    }xms
[0] http://search.cpan.org/~dconway/Regexp-Grammars-1.041/lib/Re...


A regex is really just a highly compact and powerful parser domain specific language and engine. People often don't understand the DSL adequately and it's use can become problematic, but properly applied it can dramatically ease certain tasks.

That said, all you really need to know is that it boils down to the exact same underlying issues that cause people to like/dislike any language over another. It's exposure and preference. People prefer verbosity/conciseness/terseness, structure/versatility, speed/ease of coding, and the things other than what they prefer are obviously wrong.


Sounds like throwing the baby out with the bathwater.


Can you please let me know what the "baby" is in this case? Nobody is seriously against regexes for all cases (all developer editors have a regex search, for example).

However, they aren't even always the clearest way to parse regular languages (depending on the language, automata can be more clear than regexes or vice versa).

Perl regexes can parse many context free languages, but again it's not always the clearest way to represent it.

Despite the original quote, I use awk as a standard part of my toolbox. I'm not going to write a web-browser in it though. Nearly every time someone says "now you have 2 problems" or "considered harmful" it's a reaction to overuse or misuse, not a literal call to abandon it altogether.


> not a literal call to abandon it altogether.

No, that's literally every case I've seen for the use of the "now you have two problems" phrase. It's like someone has a bot on the lookout, alerting them to mentions of regexp so they can run in and say "now you have two problems!"

I think it's another symptom of the common trend of modern programmers not wanting to do any programming anymore. Breath mention that you might want to implement your own text editor control because you haven't been able to find one that suits your needs and immediately get jumped no with "don't reinvent the wheel!" Spend a small amount of your spare time toying around with toy programming languages and it's, "you have too much time on your hands!"--as if being massively overworked on business interests is some kind of virtue. Make a puzzle game without using Unity or some other overblown framework on your small project and it's "you're nuts!"

No, literally, I got called insane once for not using Angular on my personal website that mostly just amounts to a list of links to my social media profiles.


Regexps are like farts. Everyone likes their own but is disgusted by everyone else's.


Please, let's try to keep the jokester replies to Reddit. I'm trying to understand why people who are ignorant of what regexp can do for them are so dead-set against learning regexp.


> Please, let's try to keep the jokester replies to Reddit.

You realize the linked to article is entirely about a joke about regular expressions. I can't imagine a thread where a joke about regular expressions would be more appropriate.


Nobody is against learning regexp. (Except lazy people, perhaps) But a lot of people are against using regexp as a default solution. Here's a pretty common case: see if the string you have contains a delimiter.

First you think, okay, I can use a function like strchr() or index(). It'll immediately return the location of the delimiter. Can't get much simpler or efficient than that!

  $loc = index($_,$delimiter)
But wait. What if my string has quotes or spaces before the beginning of the string or delimiter? I don't want any of that crap. Now I need to write a bunch more parsing code - or, I can use a regex!

  $delimiter = "=";
  $_ = q| " my key = something in the string " |;
  /^\s*"?\s*(.+?)\s*($delimiter)/;
  $keyloc = $-[1];
  $delimloc = $-[2];
  print $keyloc; print $delimloc;
  3
  10
Ok, looks good. Let's try a couple different delimiters.

  $delimiter = "_";
  $_ = q| " my key = something in the string " |;
  /^\s*"?\s*(.+?)\s*($delimiter)/;
  $keyloc = $-[1];
  $delimloc = $-[2];
  print $keyloc; print $delimloc;
  
  
Hmm... no output at all. Weird. Oh! index() will normally return -1 on failure, but $-[] doesn't get set if the match fails. We forgot to change the delimiter. Ok, try again:

  $delimiter = "_";
  $_ = q| " my key _ something in the string " |;
  /^\s*"?\s*(.+?)\s*($delimiter)/;
  $keyloc = $-[1];
  $delimloc = $-[2];
  print $keyloc; print $delimloc;
  3
  10
Ah, that's better. Let's try another delimiter.

  $delimiter = "+";
  $_ = q| " my key + something in the string " |;
  /^\s*"?\s*(.+?)\s*($delimiter)/;
  $keyloc = $-[1];
  $delimloc = $-[2];
  print $keyloc; print $delimloc;
  Quantifier follows nothing in regex; marked by <-- HERE in m/^\s*"?\s*(.+?)\s*(+ <-- HERE )/ at -e line 1.
Holy shit, a fatal error? Hmm, it was just a delimiter change.... welp, looks like the regex parser thinks '+' is mart of the match. Need to escape it so it's not interpolated:

  $delimiter = "+";
  $_ = q| " my key + something in the string " |;
  /^\s*"?\s*(.+?)\s*(\Q$delimiter\E)/;
  $keyloc = $-[1];
  $delimloc = $-[2];
  print $keyloc; print $delimloc;
  3
  10
Ok, it's working again. But what if there was no key entered at all - just the delimiter and the rest of the string, like if a filesystem path was entered?

  $delimiter = "/";
  $_ = q|/path_to_a_file.txt|;
  /^\s*"?\s*(.+?)\s*(\Q$delimiter\E)/;
  $keyloc = $-[1];
  $delimloc = $-[2];
  print $keyloc; print $delimloc;
  
  
Crap. The delimiter is there, but my regex is broken again, because it expected a (.+?) before the delimiter. Time to fix it again:

  $delimiter = "/";
  $_ = q|path_to_a_file.txt|;
  /^\s*"?\s*(.*?)\s*(\Q$delimiter\E)/;
  $keyloc = $-[1];
  $delimloc = $-[2];
  print $keyloc; print $delimloc;
  0
  0
There! Whew. That didn't take too long. Let's just hope nothing else unexpected happens, huh?

index() and rindex() would not have had all these issues - they would have returned a location if the delimiter existed at all, or -1 if it didn't, and wouldn't run into interpolation issues, etc. All of these bugs (AND MORE!) can be solved by just writing a parser, or using a couple index() and rindex() calls, or restricting the format of the string to more rigid rules. But by using regex's, we've doomed ourselves to more unexpected issues in the future.


This is a ludicrously contrived strawman.


Not as ludicrously contrived as an html parser using regexps.


Dude, code however you feel like. I'm not getting into a troll fest about why it's stupid to use regex's for everything.


I have never seen anyone argue for using regexp for "everything", but I see on a very regular basis people arguing they should apparently never be used for anything. Even the simplest of questions on StackOverflow get answered with the ever condescending, "what are you trying to do?" Followed by "now you have two problems." Followed by "use a parser." Followed by silence on how that specifically applies.

I called it a ludicrously contrived strawman because your proposed remedy to "just write a parser" is not any simpler of a task than the one you mocked up for regexp. There are still plenty of bugs you get to write and miss for several hours when you write any nontrivial software.


1. Nobody on HN is saying to never use a regexp 2. It's not a ludicrous example 3. A parser is not a ludicrous way to solve the above problem 4. It's not a straw man because it's not an irrelevant argument set up to be defeated, it is specifically an example of how EITHER using some simple functions OR a parser would be less problematic in practice than the gradual bit rot of erroneous use of the extremely powerful and unnecessary regular expression 5. The software becomes nontrivial when you complicate it with regular expressions 6. Where are these examples of people telling you never to use regular expressions 7. How is it you run into this on a very regular basis 8. If it's a simple question it probably has a simple answer and regular expressions are not simple as my example has shown 9. There's a reason this phrase is a truism and it doesn't need a mathematical proof to be accepted as a truism 10. Code however you want dude, it doesn't matter what a bunch of people on StackOverflow or any other website say except that 11. If a lot of people keep saying the same thing, there might, just might, be some merit to it.


It's included here (along with some other gems): http://install.lon-capa.org/bugzilla/quips.cgi?action=show




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: