Hacker News new | past | comments | ask | show | jobs | submit login
Does email address obfuscation actually work? (superuser.com)
95 points by ivoflipse on Jan 21, 2011 | hide | past | favorite | 53 comments



Doesn't it depend on what you mean by "works"?

It probably cuts down on spam, but I'm sure it also counts down on legitimate inquiries that didn't get made because the person who wanted to contact them said "fuck it" rather than typing the e-mail by hand. (You could argue that lack of e-mail from those people isn't a great loss, but whatever.)

Isn't spam detection at this point good enough that this isn't a concern anymore? gmail has let precisely three bits of spam slip through in the past two months.


I can't remember getting false negative but I'm started to get more and more false positive for spam with Gmail unfortunately. All emails from LinkedIn, my alumni association and recently Facebook go to spam, even after clicking "Not spam" repeatedly. The annoying thing is that these are emails that I received in my inbox, opened and didn't mark as spam for years, but they are now considered spam with apparently no way for me to get the engine to learn.


That's a concern, true. Some e-mails from my family have ended up there, because they live in Russia, write in Russian, and are therefore suspicious.


Gmail lets you use filters to prevent emails that match your rules from being marked as spam.


But this is Google. The expectation is that they will watch us online and understand our every desire!


I smell sarcasm in your comment. But the point is that yes, I expect Google/Gmail to understand that when I'm marking an email as "Spam" or "Not Spam" by clicking the buttons they put there, similar emails should be flagged (or not) accordingly next time one comes in. If clicking these buttons doesn't do anything, they should just get rid of them.

Sure, I could use epochwolf's suggestion with filters, but filters are for granular sorting and personal preferences: adding labels, skipping inbox, forwarding, etc. that would be pretty much impossible to guess for a "machine". As far as I'm concerned, clicking "Not Spam" should be very similar to creating a filter that says "every email that look like this (same address, same kind of content, etc.), do NOT mark it as spam". If this is not what's happening, these buttons are just useless.


Truth told, there is sarcasm and truth.

I do want google to take all of the data they keep on my searches and all the little scribbled notes their furtive servers make on my whereabouts as I stroll around the web and put it to use to make my experience better.

Did google show me an ad at http://troutpeeler.com? [1] Did an email come to me from do_not_reply@troutpeeler.com 5 minutes later about an order from me? That isn't spam! And now they know I'm interested in at least one sort of fish peeler. Take that into consideration when a commercial mail for fried fish skin enthusiasts shows up at my inbox.

[1] This is meant to be a fictitious instance of a web site and a product. Heaven knows I'm rubbish at making up unused names, so there probably is a Trout Peeler website somewhere and a group of peel that make fried fish skins. I don't want to hear about it. Ick.

[2] I wonder how many people with browser extensions have to click on http://troutpeeler.com to trick a name squatting robot into reserving the name.


The top rated answer has a couple of interesting techniques to get around this problem.

Also, not everyone has a Gmail account. I'm sure virtually everyone on HN does, but some of us also wind up making websites for people who are less savvy. So techniques for obfuscation could still be useful.


> Also, not everyone has a Gmail account.

True, but I thought spam detection was something that was 99% effective with bayesian filters. Sure, not all of your clients will have them, but I figure that people who are truly worried about this sort of thing will either be running their own mail servers, or will pay someone to do it for them.


I frankly cannot understand why, but on my old yahoo account at least 50% of the spam gets through, including those in suspicious russian (which, since I do not speak the language and have always flagged as spam, makes zero sense to me).

So, either the gmail engineers are better than most, or it is still not trivial to get it right.


I see a lot of people saying "isn't spam detection good enough that we can forget about this?" without adding "spam detection on Gmail," apparently. My default installation of SpamAssassin certainly doesn't catch spam as well as Gmail. So I'd say that this stuff is not super easy right now--it's only super easy if you want to give Google your e-mail.


SpamAssassin is rules based, on a known corpus with rule modifications sent out from time to time. In order to get better detection, you can run something like dspam or crm-114 along with, which are statistics based, and with a short training period, can get close to gmail's accuracy.

The other benefit to gmail is shared inoculation. By the time you get to your mailbox, a thousand others may have reported it as spam, and it is already marked as spam on your inbox. With a small, personal mailbox, or even a companywide deployment, you may not get enough benefit from shared inoculation.


> SpamAssassin is rules based, on a known corpus with rule modifications sent out from time to time. In order to get better detection, you can run something like dspam or crm-114 along with, which are statistics based, and with a short training period, can get close to gmail's accuracy.

FWIW I do train my SA installation (actually amavisd-new) with sa-learn.

> The other benefit to gmail is shared inoculation.

A very good point. My mail server has very, very few users.

On the other hand, I am using DNSBLs, Razor, and Pyzor through SA, so you might think I'd benefit more from that. (Which is not to say SA isn't blocking a ton of spam--it's just not doing as good as Google: one every six weeks or something from Gmail, a handful every day from SA.)


greylisting can also help.

I run greylisting -> dspam (toe) -> tmda for anything dspam flags and have had 1 missed spam this week. Looking through the tmda queue, I see no false positives. I have had false positives in the past, but, once they reply (similar to spamarrest), it lets them through. I do run a few DNSBLs, but, really haven't see much need to increase it. I'd say 80% of the spam I used to receive was eliminated with greylisting. Probably 60% of it now as spammers are starting to hack actual servers that will retry rather than sending mail from botnets.


I personally hate greylisting. I use IMAP IDLE to get notified on new mails immediately, so I don't want the MTAs to bounce mails around. Especially since SpamAssassin works so good.

What I do is the following: SpamAssassin rules + Bayes (currently about 200k mails trained, 25% of them spam) + URIBL. I think I could even tune that with Razor and/or Pyzor but I get to few spam to actually care.

Since that, I've given up hiding my mail. On a long enough timescale, the probability that spammers get your email address is 1, so why bother?


Greylisting only affects the first time a sender/receiver pair sends you a message, and then only every 31 days if they haven't refreshed the 'seen timer'.

As far as emails I receive, 99% are from people I've corresponded with and sometimes when I'm on the phone and they say, I just sent you my contact information, it is a little disconcerting the first time to have to wait 5-10 minutes.

I know spammers have my email address. I don't want to have to waste the time looking through the spam folder any more than I have to. The way it is now, I have a high probability that every email in my inbox is an actionable email, and I'm not stuck shuffling through a junk folder.

But, you do have a very good point regarding one of the downfalls from greylisting.


Aren't there alternatives that share the learned rules?

And if not... maybe someone should make one.

The obvious downside is that someone could poison the well. I don't know how to go about fixing that off the top of my head, but I've been drinking. :)


How about statistical filtering---to see how well your newly imported rules do?


I use JavaScript obfuscation: a tiny WordPress plugin I wrote that essentially embeds the email address on the web page in scrambled form, and then transforms it to a clickable `mailto:` link client-side in the user's web browser. A message describing how to contact me is seen if the visitor has JavaScript disabled.

In theory an address harvester could easily work around such simple obfuscation, but in practice this seems to raise the bar just enough to make it not worthwhile for them. Email addresses I've posted in plaintext receive much more spam than the one I have obfuscated on my web page. (But greylisting plus SpamAssassin seems to take care of that spam quite well, so it's not as though the obfuscation itself is my only line of defense.)


Cory Doctorow wrote about this in a column last year: http://www.guardian.co.uk/technology/2010/dec/21/keeping-ema...

He's had the same public email address published without obfuscation for more than 10 years. He describes his anti-spam setup which has served him reliably and argues that the nuisance imposed on correspondents by scrambling his address isn't worth it.


Here's my technique. I use +folder notation (me+folder@gmail.com) for my publicly listed email address.

Any spambot that regexps \W or [a-z0-9\.] will get screwed up by the + that appears in the address. However, it's still a totally legal address, it delivers correctly, is clickable in browsers, etc.

It's not that I get a low level of spam at this public address - I get ZERO spam. Zero in the last 5 years. I get a small amount addressed to the plain me@gmail.com address, but it seems like spambots just don't pattern match for this. Anyway, it's still a gmail address, and their spam filtering might have something to do with this.


I've done my best to not obfuscate my email address. If the situation dictates, I will even post it on Twitter without obfuscating it in any way. I probably do this at least once per week. We have plenty of defined support vehicles, but every so often someone needs extra-special help and I do my best to oblige.

I probably get 30 or so pieces of spam per day, all of which ends up in my Junk folder with no effort on my part. This has remained constant over the years. I get a lot more random cr@p on my personal address ever since I was placed on a list of "people who will blog about your stuff." (which is emphatically not the case).

I also created a personal FAQ (http://www.jeff-barr.com/?page_id=670) to cut down on the amount of random stuff that I get.


Mine does (see profile).

Gmail does a great job of keeping spam out of my box, but I still have to go through the spam folder occasionally to look for false positives (I get a couple a year), so I'd prefer as little spam in there as possible.

As for obfuscating, I use methods which require the reader to think, not to apply a rule. All the obfuscation methods presented in the superuser answer can be algorithmically beaten. Mine can't. It requires understanding the words I used. That is, unless it gets copied enough times to be recognized as a standard pattern for obfuscation. But if that happens, I can just change it to something else that requires reading and understanding.


While your obfuscating method is probably reasonable your email address is probably an issue. It's not hard for a spam company to send email to every basic pattern on 2 letters and a last name. When you can send a thousand messages for a penny it's acceptable for 99% to be sent to the wrong address.

PS: My first Gmail account got spam before I started to use it.


I use three methods for protecting my email address:

* Spamgourmet for websites I don't trust. some_random_text.underwater@spamgourmet.com will forward six emails to my real address.

* A some_random_text@my-domain.com address with a catch-all for mail I really need.

* ReCAPTCHA's mailhide <http://www.google.com/recaptcha/mailhide/>; for posting my email on the crawlable web.

I have noticed that most spam I do get is not from online businesses, but from businesses I've given my email to in person (e.g. conventions, competitions).


I came up with this question when I was reading my RSS feeds when I got to an "Ask Engadget" post that requested users to email questions to "ask [at] engadget [dawt] com". It seemed pretty silly for a popular website to obfuscate their email address in such a weak way, and it got me to thinking whether or not the widely-used technique is effective at all.

Does anyone have spam problem with a published email address? Have you had success reducing spam by obfuscating it?


Anecdata: Spam just isn't really a problem. I've been using the address dmd@3e.org very publicly (as in, unobfuscatedly posting to usenet, forums, etc) for about 15 years now. It currently forwards to gmail, where I get roughly 3000 pieces of spam mail each day (yes, that's a new spam every 30 seconds, round the clock). Of those, typically 0 or 1 end up in my inbox. I have, in the past, done spot-checks of my spam folder to check for false-positives, and have never found a single one. Gmail is really good at spam filtering.


Anecdata II: I get about 1,500 spams a day to my Google-managed email address, and have found a worrying number of emails in my spam folder. It works really well for you, sadly less well for me :-/


> have found a worrying number of emails in my spam folder

Sorry, but just to be sure - am I getting it right, those emails were false-positives?


Sorry, but just to be sure - am I getting it right, those emails were false-positives

Yup. For the most part, ones that I'm fine with - e.g. order confirmations from stores - but I've found the odd one in there from someone the first time they email me, most frustrating.

On the other hand, I once nearly missed out on a fun blind date after my initial email was helpfully filtered by Hotmail into the girl-in-question's spam folder. And it didn't even mention Viagra!


Same here. Gmail has been my spam-filter for years. I have no problem putting my e-mail addresses in clear anywhere.


For the sake of the argument: if you get 3000 messages of spam per day it means that it takes a lot of effort to check them all.

Also, as you receive a lot of spam, the distribution is highly skewed towards spam, so it is unlikely you find a false positive because there are a lot more true positive, not because the false positives are absent.

So there is a chance you are getting false positives, but have failed to spot them (I know it happens to me).


> (I know it happens to me).

How?


I get noticed by other means, e.g. - I sent you the link on gmail - no you didn't - check your spam folder - I just did! - check again! - alright.. ah err, yes you did


A few comments on this down memory lane: http://news.ycombinator.com/item?id=1463579.


I don't bother obfuscating. I don't even use spam filtering anymore. I get about 5 spam emails per day max. I think it helps that my address starts with a "j" so most open relays / spam zombie machines have been shut down before they get to my letter of the alphabet in their email lists.


The first comment is great:

http://techblog.tilllate.com/2008/07/20/ten-methods-to-obfus...

Different approaches to obfuscating the email address and comparing spam levels.


Back when I saw that I wrote a little js test page to generate a combo of all of them :) The result it pretty nasty.

http://icefox.github.com/js_email_link_hack/


Not to self promote, but many years ago my site was the first to generate "email icons." Which are just an image of your email address, which does a good job of fooling most email harvesters. I'm surprised it hasn't caught on more. http://services.nexodyne.com/email/


- You can't click on them

- You can't copy them

- You can't type them

- You can't embed them in text

(Though, the latter two problems apply to some of the methods discussed here as well)

If they did catch on widely, it would be trivial for email address harvesters to read them (see: increasingly-complex captchas)


True, though you can't click on what people are currently doing either...or copy it (I mean kind of, but not really).

I do agree that eventually email address harvesters could eventually be made to read them though, and that would be a big weak point if they ever did catch on.


The CSS-based techniques both work in IE6, which is pretty amazing. And you can combine them.

But don't use inline styling. Bury the rules in the cascade with some parent class so they can't be easily filtered.


the most upvoted answer there is what a real answer should be :)


Probably about as well as an elephant-repellant


I don't understand the point anymore. Spam filters are much much much better than spammers these days.


Not thunderbird's filter, unfortunately.


I've had my public email address, with no obfuscation, on quite a few very high traffic pages.

Remarkably little spam gets through to me. I actually see more spam in my non-public gmail account where, I suspect, the spammers are simply name guessing.

Long story made not much shorter -- I would guess that scraping isn't that big of a strategy any more.


Yeah, same here. It's not a high-traffic website by any means, but my @lastname.ca email is listed on my lastname.ca trivially encoded with character entities. The volume of spam I get through there is dwarfed by the volume of spam I get from spammers name-guessing emails at my university's email servers.


It always fascinates me that nobody ever brings up regular expression in these kinds of discussions.

Spammers literally sit around all day figuring out ways to deliver more spam, I'm fairly certain they've spent the 30 minutes it takes to craft a regular expression to harvest the easy 80% of these 'obfuscated' e-mails.


Sending spam is a means; the goal is to have people click on ads, buy stuff, download malware, etc.

The distribution of the probability that users click an ad is highly skewed. I suspect that it has probabilities close to zero for people who use "these 'obfuscated' e-mails".

If that is the case, spamming those users does not make economical sense.


Sure, it's a means.

So is scraping - a means to sell big lists of e-mail addresses to people who buy them for many thousands of dollars, with little regard to where they came from - only that they receive e-mail.


I was going to come in to post this. Regular expressions are the first thing I think of when it comes to this, and making a pattern to match [AT] instead of @ is just a few keystrokes away. Make up a few permutations of popular replacements and you've just defeated 90% of email obfuscation in 15 minutes.

And yet I see seasoned programmers who should ostensibly know a little about regex using this kind of obfuscation all the time!


"I don't have to outrun the bear, just you."

So long as you're in the other 20%, you'll probably do well.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: