Hacker News new | past | comments | ask | show | jobs | submit login

What if google just immediately fetches images for all received messages, regardless of deliverability? They can dedupe wherever possible, but the sender still ends up with no information about whether the message was delivered, much less read.



What if google just immediately fetches images for all received messages, regardless of deliverability?

I'm trying to work out whether this is more useful as a way to get Google to DoS themselves or as a way to get them to DoS arbitrary web sites of others. Either way, isn't this a gift to trouble-makers?

Of course Google would probably develop an automated defence against such attacks quickly if they happened in practice, but it seems any such defence would necessarily involve not caching all the images in advance, which would defeat the original point.


I'm fairly sure sending an email is more expensive than sending a GET, so it should be more effective for an attacker to make the requests directly than trying to use this to get google to proxy an attack.

I also strongly suspect that google's crawling infrastructure is more than capable of fetching a bunch of images for every single message gmail receives.

But even if I'm wrong about the above, google is perfectly capable of throttling their fetching to mitigate. (The problem really ends up looking an awful lot like crawling the internet, which is an area that google seems to have a bit of experience)


In reality, I'd be less worried for Google and more worried for whoever is hosting moderately large images that get linkjacked in numerous variations (http://www.example.com/largeimg.png?randomnumber=72435).

Google can't tell, a priori, whether or not a series of similar e-mails sent to many thousands of people with Google Mail addresses and containing similar but different image links like the above is a genuine mail going out to someone's list or a DDoS of www.example.com in which Google is about to become an unwitting participant.

By the time they've worked out whichever trick is being used this time (in the same way that they adapt to changing black hat SEO tactics, but probably only make major changes every few months) it's not hard to see a hostile party busting the bandwidth cap for anyone on a basic, low-volume hosting plan.


Why involve Google? Aren't sites on basic, low-volume hosting plans easy to knock over, without resorting to DDoS tactics? And if you're trying to knock over bigger sites, it doesn't seem like Google would make a very good DDoS platform in any case, since the requests would be originating from a relatively small range of IPs that a bigger site could just ban. Presumably the only reason they wouldn't want to ban the requests is if they're actually the ones sending the emails in the first place, so the problem sort of solves itself.


This is an old problem with an old solution. If you have an expensive-to-generate resource that you don't want automatically retrieved en masse, you use robots.txt to deny access to it.


AND it could create a dis-incentive to load up an email with unique images since as soon as you send the email out all of those gmail addresses are coming right back at your server to request the images.


And then you can cause Google to DDoS someone else's site by sending out spam containing lots of image URLs.


You can somewhat do this with the current system. I had no problems sending an email with 10 10mb images. Google happily fetched all 10 of the images off my server.

Not sure if they limit it at some point, but if a server accepts urls such as:

http://i.imgur.com/9Y5FDz7.jpg http://i.imgur.com/9Y5FDz7.jpg?1 http://i.imgur.com/9Y5FDz7.jpg?2 etc...

Google would fetch each separately. Send this out to a bunch of people, and it seems problematic. I'm going to be optimistic, and assume they built in some sort of limiting, but who knows.


Users would have to have previously agreed to "Always load images from domain.com"


The cost of a new domain per e-mail campaign would be trivial.

(and "normal users" do click the show images links)


We manage opt-in mailing lists for customers of restaurant chains, and for well timed, well-targeted campaigns they get open rates in the 30%-40% range with the majority of opens within 15-30 minutes of the send anyway. It takes more resources for us to handle the outbound mail load than to handle the inbound image requests, as for the image requests the url's contains enough info to do a trivial regexp based rewrite and fetch the images from a cache. I don't think handling a 100% open rate as soon as the mail was delivered would even remotely be a challenge.


Just curious, how big are those lists? There is a big difference in a restaurant with 10,000 customers and an e-commerce site with 3,000,000.


Yeah, good luck with that. Instantly, every hacker in the world would use Gmail as a DDoS amplification tool.


This is the correct answer. If google does not do this, well, we know whose side they're on.


PG recommended this method years ago for fighting spam. Puts a load on the spammers. Not sure if anyone has ever tried it at scale.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: