This one comes up ever 3-4 years or so in sysadmin communities, and I read it every single time. because it's worth it.
It's one of those things that I highly doubt would have occurred to me to have even checked, or given even a moments thought to, under normal circumstances.
I was looking for another famous sysadmin story, where the guy who also happens to be a top Linux developer (so maybe Alan Cox?) rescues a deeply broken Linux system where even glibc is no longer accessible by manipulating inodes in a running process. Or something.
Over the years, my Google-fu has failed me. Any clue? :)
I remember it happened at NYU a couple of years ago and they turned it into a kind of ad-hoc social network/partyline. I wonder if anyone archived those emails? I suppose they deserve to remain "private."
One listserve (can't remember which) made up a list for people who complained like this instead of following the unsubscribe instructions. The admins would remove complainers from the normal lists and add them all to one mailing list, where the only emails they got were each others' demands to be taken off the mailing list, with unsubscribe instructions added to the beginning and the end of every single email.
Ha. There is no explanation of why the mailing lists were named "Bedlam" though, and I doubt non-native readers know what it refers to. To quote Wikipedia [0]:
"Bedlam may refer to:
Bethlem Royal Hospital, London hospital first to specialise in the mentally ill and origin of the word "bedlam" describing chaos or madness"
I also found that to be evidence of pretty horrific architecture in Exchange. Two actual recipient lists with a secret internal one? Bloating headers to 13K? At the very least, it seems to me like they chose to put the distribution logic at the wrong layer...
Thanks for the link. I was surprised that it was written by Larry Osterman. I enjoy listening to his stories about Microsoft. Have you seen his Channel 9 videos [0]? I really enjoy the checking in videos with Erik Meijer.
If only every bug report that I received had been processed by a geostatistician... Usually I get a "hey, I can't get X to work". One of three responses from me usually fixes it: "Is your computer on?", "are you online?", and "try hitting refresh".
I am actually surprised the sysadmin in this scenario thought it was a bad thing that the statistics department did their research and presented a well documented error.
Well, technically, the geostatistician (Did I spell that right?) was doing research that was orthogonal to the actual problem and its symptoms. In this case, the results were sufficiently odd that they sort of pointed in the right direction, but I've been sent off on wild goose chases by people skillfully applying their own particular set of skills before.
On the other hand, there's the word document with nothing but a screen shot showing half of a useless error message.
Reminds me... when I post a support request to Google Apps, the issue description header says "in as much detail as possible"... but the field is limited to 1000 characters. When you're dealing with anything other than simple first-level support issues, a user simply can't put in a usefully descriptive amount of detail...
seq has the -s flag which voids the need of the paste for that command:
$ seq -s + 10 20
10+11+12+13+14+15+16+17+18+19+20
But I agree that the paste is very useful.
# a few random samples for an IN SQL statement
$ shuf -i 1-500000 -n 5 | paste -s -d ,
371492,250061,266669,455846,295852
# we can even get PI
$ ( seq -s + -f '4/%g' 1 4 100000 && seq -s - -f '4/%g' 3 4 100000 ) | paste -s -d - | bc -l
3.14157265358979523735
units is nice, but there isn't much help, and the syntax isn't always easy to remember. It was fun to play with for a while, but wolframalpha.com is better.
Shouldn't this account for a round trip, and the speed through copper (~ 2/3rd of the speed of light)? That would lower the radius to much more than 500 miles.
I had this thought when reading this before as well. I imagine that the "3 milliseconds" they determined from testing was a typical number, maybe the median/mean, and that the actual timeout varied considerably depending on CPU load at that particular moment. Add in a number of retries for the server to attempt sending each email, and the effective timeout might have been a few milliseconds more... or at least it must have been, because `(2 * 500 miles) / (2/3 speed of light)` works out to about 8 milliseconds (where the 2X is for the round trip, and 2/3 is a rough multiplier for the speed of light traveling in either copper or optical fiber).
I felt for the author as I got deeper into the faq, and recognized this pattern of cynicism, then decided the author was so generous and thorough, not out of obligation (make the emails stop!), but because that is the type of detailed person he is -- and good at dinner parties too!
Writing stories for a technical audience is tricky. I've been doing it for going on 10 years now, and I'm still not very good at it.
A critical rule, however, is to omit detail, (a reader is unlikely to question an explanation they make up themselves) and most importantly, to omit details you know to be wrong. (It is impossible to nitpick a statement that is never said)
An odd feature of our campus network at the time was that it was 100%
switched. An outgoing packet wouldn't incur a router delay until hitting
the POP and reaching a router on the far side. So time to connect to a
lightly-loaded remote host on a nearby network would actually largely be
governed by the speed of light distance to the destination rather than by
incidental router delays.
He knew this was largely wrong, and didn't really improve the story, yet he said it anyway. It should have been summarized in a single sentence, leaving out all the problematic assertions that the slashdot trolls leaped on.
Hi, ceequof, the original author here. I agree with you completely in concept; it was a stupid thing to include and I should have cut it.
But as I wrote in the FAQ, I fired off that email in under an hour in reply to a fast-moving thread on an email list where people knew me by reputation and wouldn't question my skills; the totality of my "research" was trying to reproduce the original numbers from memory ("500 miles" stuck in my head, but the distances to the places I remembered pinging did not); and I didn't ask anyone else to edit it for me.
All of that would have been ridiculously unprofessional of me as a writer for something intended for as wide an audience as it went to. But I had no idea it would be forwarded so much and so often (nor so many years later, now decades after the original event!).
Reminds me of the old saying, "it's better to stay silent and be thought a fool, than to open your mouth and confirm it."
It's also a reason why short business emails are better than longer ones. You can always go more in depth. It takes skilled restraint to touch on only the most relevant details without losing the larger point.
Yes, I found this to be one of the most refreshing technical anecdotes I've ever read. The tone and style actually put a smile on my face as I read. I enjoyed how the author guided us through the process of discovery, one which we all know so well, driven by an insatiable curiosity to go continually deeper down the rabbit hole until we find the bottom.
Harumph. :/ Yes, I'm a terrible story-teller for this reason. To me, the details (especially in making sure the numbers line up with reality) are important.
In Real Life, I totally agree the details are important. And I think I have evidence of this: at Google, where they had peer bonuses where one engineer could give money to another as a pat-on-the-back, I got dozens for my post-mortems.
Post-mortems are a case where you must have both story and correct details: lack the first, and you won't create change because the people who need to know in order to implement the required recommendations won't read the whole thing (or retain it later); lack the second, and really, how can anyone trust your recommendations?
Here, I was just trying to quickly bang out a funny anecdote. The things that stuck in my mind I could use to reverse-engineer numbers. I did this because—at the time I worked the incident—I was working with real numbers, so the story needed them for verisimilitude, to give a sense of what I was wrestling with. If I'd had any clue this mail would have taken on such a life of its own, I would have been more careful with them and gotten a tech reviewer and copy editor before posting.
This gets posted on some forum or another several times a year; for a long time I had a Google Alert on it and would hop in threads whenever it happened, since it always followed a common pattern:
1. Someone posts a link to the story, but not my canonical copy with a link to the FAQ.
2. More trusting and/or less-technical respondents upvote or forward or Like or +1 or quasisuperplauditize or whatever the medium has until it gets notice from...
3 ... less trusting and/or more-technical types, who expose the "flaws", most of which are covered in the FAQ.
4. Someone thinks to do a Google on "500 mile email", which returns as the top two results my canonical copy and the FAQ, and posts a link.
5. Most people lose interest while a few continue to squabble over ever-finer details.
Depending on at what point I jumped in, I could affect the speed of the above cycle, but it never changed the cycle itself. The fun of the story is following me through my own emotional cycle I felt when I worked the issue, starting with the initial "no way" to "you're having me on, right?" to "maybe...", to "dear God, this is actually happening", to "I must be going crazy", and finally to "Eureka!"
My intervention in the above cycle really wasn't adding that much to the enjoyment of the story, so I stopped doing it. (I'm not sure it's adding anything today, either, but Hacker News is an important enough forum for people I respect and care about that I thought I'd break vow and rejoin the fray this once.)
Another of the 10,000 here - this is such a delightful story.
Also just discovered the "units" conversion program and disappointed that the default Mac library has only 586 units. And shockingly there don't seem to be compatible libraries out there.
As I wrote in the FAQ, I decorated my own units.dat (units.lib in some implementations) with lots of stuff because I like easily editing units. (Nowadays I use Emacs Calc, but I still add a bunch of my own units, like the binary prefixes like mebi, gibi, etc.)
I suspect that a lot more can convert millilightseconds to miles out-of-the-box now at least in part because of the popularity of this story over the past 13 years.
Thanks for a good read. Its strange to think about a time when there were a myriad of incompatible networks, and their different capabilities could be exploited.
Since I've seen a few comments about units not having lightseconds so here are a few ways to add the missing unit if you don't have it.
1) Add this line under the lightyear definition in /usr/share/misc/units.lib (or wherever `man units` says the standard units library is under the FILES section)
lightsecond lightyear / 365.25 / 24 / 60 / 60
2) If you're on a mac and use homebrew just `brew install gnu-units` and then run `gunits`
But then it sent him off in a direction not worth going. He literally started to map out how far emails would go if they succeeded. The whole time the error was in the timeout instead.
TTL is involved when dealing with routed networks. The farther the destiny, you normally get more hops on the way. If the starting TTL is low, you won't reach the destiny. So, TTL values cause problems like this, although the radius wouldn't be so precise. Damn statisticians!
Via 'man units': "The conversion information is read from a units data file that is called 'definitions.units' and is usually located in the '/usr/share/units' directory."
Some distributions only support lightyear so adding this line to your units file (which you can find with man units) will give you support for *lightseconds:
I had the same thing happen to me. From the manpage I gathered that units uses the definitions defined in /usr/share/misc/units.lib, by running cat /usr/share/misc/units.lib | grep light I found I only had lightyear and it's shortcut ly defined. I added lightsecond, and since milli prefix is already defined it worked a treat.
Absolytely a good reading. Sometimes this kind of readings can help in a complete different problem. Sometime happens you are dealing with another problem, then you remember this story, and you figure out what's wrong because there're some similarities. I remember to have fixed a problem with Postgresql remembering a story about Unicode and Postfix, different domain, but similar problem.
If you're a sysadmin and someone brings in a consultant who gets root access and upgrades the whole OS to a new operating system which then almost takes out email.. wouldn't that be a problem?
If I were the sysadmin and that happened, I would need to have a meeting with some people. What's the point of being a sysadmin if he operating system is randomly going to be completely changed without someone telling you?
I have a fair amount of built up rage. This seems like one of those situations where it is actually your responsibility to rip people a new one.
Every time I read this I am reminded of units(1) util, which is super useful and I always forget about and revert to Google. But yeah, that connect timeout to 500 mi correlation is fun too.
Once a year is about the right frequency. Recurring stories is one way in which a community shares and perpetuates its culture with newcomers. Some of them are a delight to read on that yearly cadence, like the SR-71 story about a pilot and his copilot becoming a crew.
That said, it's wise to consider the frequency with which such things appear, individually and in total. Too much repetition and focus on memes becomes dysfunctionally self-obsessive. Not sure what the right answer is, but I can probably deal with once per year, short time on front page, and small % of total content.
This is an interesting idea. Have a system where a community can mark something as important, and to have it automatically reposted at preset intervals. Community members could be allowed to additionally repost, or the system can politely say it's already archived and will be shared again on such & such date. Use it as a way to reinforce community history.
Gotta "love" something that is highly reactive to test engineers...
The book also holds, iirc, a passage about a group of chemists sitting down for lunch when something on one of the shelves start getting uppity. They barely had time to dive under the table before the container started bouncing around the room.
This doesn't apply very well... HN is heavily archived... this comic is about being rude to people for not knowing about something, not justifying shoving the same cyclical content in people's faces repeatedly.
The new stories are okay, but ... eh, its not the same thing. Its more brute force dickery between management and IT instead of the subtle interaction it used to be.
Wow, I must have bad timing. I've had an account here for almost all of those, and I think I was probably lurking for the 1 or 2 occurrences when I did not have an account, but don't remember seeing it before.
I don't think this is a bad thing. It was either 1 or 2 years ago when I first read about this - newcomers to the community have to find out about things in one way or another.
> And also being a good system administrator, I had written a sendmail.cf [...]
Say what? Nobody writes a sendmail.cf from scratch, unless they are crazy.
> ... that used the nice long self-documenting option and variable names available in Sendmail 8 rather than the cryptic punctuation-mark codes that had been used in Sendmail 5
Good system administrators stick to conservative, portable subsets of configuration and scripting languages, rather than bleeding edge stuff.
When they deviate, they have a clear plan. They document their choice to use something new and shiny, and they keep it separated from the default system configuration.
Since SunOS came with Sendmail 5, the upgraded Sendmail 8 should have been installed in some custom location with its own path so that it coexists with the stock Sendmail, and is not perturbed if the OS happens to upgrade that.
A good syadmin would stick that in some /usr/local/bin type local directory, and not overwrite /usr/bin/sendmail.
The consultant was not wrong to update the OS. People have reasons to do that. The consultant should have consulted with the sysadmin, of course. But even in that event, it might not have immediately occurred to the sysadmin what the implication would be to the sendmail setup.
Goodness, you're determined to find fault, aren't you? (For the record in re your comment later about my "basis to call [myself] a good system admin", those claims were a) jokey, and b) fairly well-substantiated by my reputation by that time, I should think. I was published by that point and had been on several conference committees along with many who'd be reading that mailing list; I hardly needed to peacock like you seem to think I was doing.)
But I think your criticisms seem a little uninformed (or possibly over-informed by later practice to the point where you aren't considering this in the context of mid-1990's practice). Let's see...
> > And also being a good system administrator, I had written a sendmail.cf [...]
> Say what? Nobody writes a sendmail.cf from scratch, unless they are crazy.
I didn't say "from scratch". I used the m4 macros to create a cf, like everyone did at the time. Using the default file would only work if you still used email programs that read raw mbox files, had no email lists, and needed no interesting aliasing or vacation script behavior. Oh, and ran in an environment where it was reasonable to assume someone's canonical email address could be found via the equivalent of "echo "${USER}@${HOST#.}".
Very few production systems could get away with that; writing a sendmail.cf was standard practice. And with m4, you usually spoke of "writing" a file where today we'd call it "configuring" a file; either way it was taking boilerplate and replacing bits with things that were right for your situation. I assume you wouldn't have had an issue with my writing that I'd "configured" the sendmail.cf. That's all I did.
> > ... that used the nice long self-documenting option and variable names available in Sendmail 8 rather than the cryptic punctuation-mark codes that had been used in Sendmail 5
> Good system administrators stick to conservative, portable subsets of configuration and scripting languages, rather than bleeding edge stuff.
Hmm, you either weren't administering SunOS in the mid-90's or you're forgetting some details. SunOS still came with Sendmail 5 years* after best practice was to use Sendmail 8. Check out the O'Reilly Sendmail book of the time's pagecount: it was longer than the prior and the later versions because it had to document both. I'm not entirely certain SunOS (as opposed to Solaris) ever was upgraded to Sendmail 8 in the distribution; obviously the people using SunOS still so late were change-averse.
"Bleeding edge" != "the version that all but the most conservative holdouts are using". Also, remember that this was the same period we were doing the rsh/rlogin conversion to SSH. Sendmail 5 still had known security issues that were fixed in Sendmail 8. We were used to replacing system components when what the OS vendor was shipping us was literally dangerous to run.
And Sendmail 8's Sendmail 5 compatibility mode was simply there for testing; it was never intended to be used production long-term, so using a least-common-denominator sendmail.cf wouldn't have been "conservative and portable"; it would have been risky, bordering on malpractice.
> Since SunOS came with Sendmail 5, the upgraded Sendmail 8 should have been installed in some custom location with its own path so that it coexists with the stock Sendmail, and is not perturbed if the OS happens to upgrade that.
> A good syadmin would stick that in some /usr/local/bin type local directory, and not overwrite /usr/bin/sendmail.
Again, either you didn't run this installation in the mid-90's or you're forgetting some details. /usr/lib/sendmail (notice the "lib"! Your referring to "/usr/bin/sendmail" suggests to me you definitely weren't running SunOS 4 or have forgotten details; sendmail was never in /usr/bin) couldn't be left alone, as other tools hardcoded that path. The actual executable was there, so symlinking couldn't be used to get around that.
> Say what? Nobody writes a sendmail.cf from scratch, unless they are crazy.
The point moreover was that he had a custom version of the config file (not just default).
Yes, sites have necessary customizations in sendmail.cf. These do not have to be rewrites that use shiny new syntax.
My biggest problem with the author was not that he uses his admin blunders as a basis to call himself a good sysadmin, but that he assumed that the stats people were idiots who don't know anything about `puters or networks.
I was not surprised by the 500 mile claim. It strikes me as obvious that the 500 miles has to do with some combination of network topology and propagation delays, those being approximately the same in every direction.
Yes, networking does work "that way": farther places take more time to reach than nearer ones, broadly speaking. (Of course, it's faster to reach something 12,000 km away with no packet switch in between than something 50 miles away with switching. That doesn't eliminate the generality.)
It was also obvious why they didn't report the problem instantly; you cannot instantly know that mail isn't reaching beyond 500 miles without gathering data and correlating to a map, which takes time. Instantly, you can only know data points like "I can't mail to users@example.com". You know that if a stats person gives you a number, it was based on data, and not just a couple of data points. The head of the stats department isn't going to give you a number that isn't factual and backed by science. Of course stats people pride themselves on their data analysis; they are not just going to relay a couple of data points with no analysis attached.
It's one of those things that I highly doubt would have occurred to me to have even checked, or given even a moments thought to, under normal circumstances.