Yeah, I am not sure where the original claim is coming from. It isn't that hard for Google to simply follow the links and compare the files as part of the caching process. So unless marketers start customizing the images in addition to the links, there isn't any reason why Google can't cache the images together.
And even if marketers do start customizing images, hasn't Google gotten pretty good at comparing very similar files? Isn't that how Google Music works without having a copy of every single individual upload?
will work just fine. So, Google will follow the link and compare the files and... wait a second, the information we care about - that slg read my email - has already been transmitted.
They are unlikely to do that though: it would waste a lot of bandwidth requesting images that users are never doing to see because they'll delete the mail before opening it. Google may have bandwidth and server resources to cobble dogs with, but they are not going to waste it like that I assume. Also if they did you could easily perform a DoS attack (or just give someone a big bandwidth bill) by sending out a pile of email with an image tag pointing to a large object on a competitor's web servers.
I don't understand how that helps. If I send an email to user_x@gmail.com with an embedded image at http://example.com/my_spam_images/image_for_user_x.jpg and google makes a request to that, I know the mail has been delivered and the user in question exists and is seeing my ads, because a request for image_for_user_x.jpg showed up in my logs.
Now, when user_y also receives my spam and I get a request for image_for_user_y.jpg and I just serve the same file, Google is probably gonna deduplicate them on their cache or cdn or w/e, but only after they've sent me the request and confirmed that someone read my email.
I'm not trying to overload google's storage capability here (lol), I'm just interested in the information leak.
Similar to how they detect spam by comparing similar emails (eg, same sender, largely the same content, etc). So the first maybe 1,000 or so see the spam in their inbox, but after enough people report it, everybody else gets it in their spam folder.
So they could learn that for all these emails with largely the same content, this one image has a slightly different URL, but the image is always the same (or similar). So as with spam, the marketers might see the first few "opens", but once Google learns that they all similar anyway, the won't see any more.
Now, I don't know if that's what they are doing, but its certainly possible.
Here's the social workaround to that: Send a few e-mail campaign with "time sensitive offers" with a timer that starts on retrieval off the image with a "oh, by the way, please not that for Gmail users it starts on delivery as Gmail loads the image right away".
People love their e-mail offers. The type of users e-mail marketers want the most - namely the ones that responds to their offers very well - would be up in arms if Gmail makes them start missing out on offers.
And even if marketers do start customizing images, hasn't Google gotten pretty good at comparing very similar files?
For image tracking pixels, I'd just start returning PNGs of random dimensions with completely random pixel data. Ta-da, unique images that aren't similar at all (unless they wanna start doing wavelet transforms or something).
Succeeded in what, confirming that Google is still in operation? Please note that Google doesn't even have to confirm the receiving email address is valid in order to get the image.
I just checked and Google rejects an e-mail after the RCPT TO: stage if the recipient address doesn't exist so they would not receive the message content.
$ telnet gmail-smtp-in.l.google.com 25
Trying 74.125.142.26...
Connected to gmail-smtp-in.l.google.com.
Escape character is '^]'.
220 mx.google.com ESMTP nh2si25383829icc.26 - gsmtp
EHLO myhostname.mydomain.com
250-mx.google.com at your service, [my.ip.was.here]
250-SIZE 35882577
250-8BITMIME
250-STARTTLS
250-ENHANCEDSTATUSCODES
250 CHUNKING
MAIL FROM: <myusername@gmail.com>
250 2.1.0 OK nh2si25383829icc.26 - gsmtp
RCPT TO: <non-existant-address@gmail.com>
550-5.1.1 The email account that you tried to reach does not exist. Please try
550-5.1.1 double-checking the recipient's email address for typos or
550-5.1.1 unnecessary spaces. Learn more at
550 5.1.1 http://support.google.com/mail/bin/answer.py?answer=6596 nh2si25383829icc.26 - gsmtp
Easily is a bit of a stretch, because most users are on NAT setups and they would need to go into their router settings and know how to set them up to allow the external request to get through. So, yeah easily if (a big if) you know how to do that, or if (another big if) you are on a machine that is directly visible on teh interwebs.
Of course, if Google instantly downloaded every single image reference in every email accepted by their MX, there is no longer any useful information in that fact.
Waiy are you saying Google will try to guess and reconstruct "Dear Marge" in the same font as "Dear John" instead of requesting both from the origin server? What if the image they didn't download includes a picture of a baby instead?
Ok, but then Google's still making a request to, here, a user-specific URL. The spammer may not know where you are, but they now know that your address exists.
They don't normally get bounce requests, though -- a tracking image is easier.
You've probably noticed that most spam comes from "borrowed" email addresses, not ones the spammer actually controls. If anyone ever sends a ton of spam with your email address on it (this has happened to me) it really drives the point home.
And even if marketers do start customizing images, hasn't Google gotten pretty good at comparing very similar files? Isn't that how Google Music works without having a copy of every single individual upload?