More

jhpacker · 2024-10-31T18:26:41 1730399201

Cloudflare radar, which presumably a much bigger and better sample, reports Bytespider as the #5 AI Crawler behind FB, Amazon, GPTBot, and Google: https://radar.cloudflare.com/explorer?dataSet=ai.bots And that's not including the most of highest volume spiders overall like Googlebot, Bingbot, Yandex, Ahrefs, etc.

Not to say it isn't an issue, but that Forture article they reference is pretty alarmist and thin on detail.

jsheard · 2024-10-31T18:29:15 1730399355

The difference is that, AFAIK, those bigger AI crawlers do respect robots.txt. Google even provides a way to opt-out of AI training without opting-out of search indexing.

yazzku · 2024-10-31T18:31:22 1730399482

And how much do you trust that shit? Has anyone set up a honeypot as an experiment?

BXlnt2EachOther · 2024-10-31T19:00:45 1730401245

possibly unpopular opinion, I trust the bigger companies more than small ones on stuff like this. It would be so much easier to not offer anything, rather than intentionally create a potemkin setting and risk the blowback that would occur if discovered. Hopefully this comment does not age poorly.

full disclosure: worked there [edit: google] a while ago, not in search, not in AI.

Arnt · 2024-10-31T19:03:57 1730401437

You can trust Google to do what it says, and yes I've seen Google obey robots.txt. You can't trust Google to do what you think is right.

yazzku · 2024-11-01T17:38:51 1730482731

No, you can't: https://apnews.com/article/828aefab64d4411bac257a07c1af0ecb

Arnt · 2024-11-02T15:31:09 1730561469

I'm a bit in a hurry, don't have time for close reading. Does that article say some Google apps (notably Maps) store locations on your device even if you have configured them to not store it in your Google account? I may miss something, don't have time to read between the lines today.

jhpacker · 2024-09-25T08:22:42 1727252562

There's nothing in the law that says one-click.

It says, "A prominently located direct link or button which may be located within either a customer account or profile, or within either device or user settings."

I think where the interpretation that one-click sub == one-click unsub is from this passage:

"The ability to cancel or terminate an automatic renewal or continuous service pursuant to subdivision (c) or (d) shall be available to the consumer in the same medium that the consumer used in the transaction that resulted in the activation of the automatic renewal or continuous service, or the same medium in which the consumer is accustomed to interacting with the business, including, but not limited to, in person, by telephone, by mail, or by email."

The idea being that one-click is a medium, which doesn't seem to be the intent here.

jhpacker · 2024-08-30T06:11:14 1724998274

With GA4, the tracker code is loaded from www.googletagmanager.com (even if the tag isn't loaded via a GTM container). The measurement requests can be sent to (region1|www).google-analytics.com or analytics.google.com (to share cookies with Google login better).

jhpacker · on July 5, 2023

https://www.quantable.com/blog Analytics experiments and opinion, around 10 years of back deep dive articles -- mostly Google Analytics web performance, bots, SEO.

jhpacker · on July 4, 2023

One of the sites (coop.se) in this decision did use a server-side GTM container to mask the IP before it was sent to Google, but they were still told to stop using GA, but they weren't fined. The DPA said that the _gads, _ga, and _gid cookies were enough to be identifiable. I don't follow the logic there, but that rules out using a proxy for compliance (at least done as coop did it).

jhpacker · on July 4, 2023

Those looking for alternatives can take a look at my book which evaluates 15 different options: https://gaalternatives.guide

I also have a google sheet listing the basics of each of those tools: https://gaalternatives.guide/sheet

jhpacker · on July 4, 2023

I do know Plausible, and their motivation is to make a sustainable business providing basic web analytics, which is why they charge for their service and Google doesn't. The data they provide to the users of their service is like an order of magnitude less detailed than what Google provides.

I get the cynicism about the industry in general since Google led this merger between web analytics and advertising, but there are plenty of providers in the analytics space that aren't following that path.

jhpacker · on July 4, 2023

Cloudflare Web Analytics is extremely simplistic and does not allow for any persistent identification of users or storage of personal information. It uses HTTP Referrers to count visitors and that's it.

One could argue that since it's a US-based company it can't be Shrems II compliant, but you can make that argument about a lot of things.

openplatypus · on July 4, 2023

As a US-based company, they process (even if they don't store) the IP address. As such, the personal data of the EU users is transmitted under the control of the US Surveillance Act. No SCCs nor commercial contracts can shield this data.

You might have a legitimate interest in processing the IP, but because of the aforementioned issues, you cannot provide sufficient controls nor protection of Personal Data.

As such, using Cloudflare as your Data Processor, exposes You, the Data Controller, to DPA scrutiny. As always with GDPR/DPA and EU, whether it is illegal/non-compliant depends on each DPA.

https://medium.com/@christhaefner/shopify-illegal-in-germany...

jhpacker · on July 4, 2023

My opinion is that this applies to GA4 as well.

The decisions don't explicitly mention a version, they say these particular sites: "...shall cease to use the version of the Google Analytics tool used on 14 August 2020". They don't say if that's UA or GA4. The original complaints from NOYB refer to UA, but the issues cited in this decision would apply to GA4 as well.

So when the DPA says "Companies must stop using Google Analytics", there's no reason to think they only mean the version that was already shut off when they published that post.

jonasb · on July 4, 2023

I guess they can't ban a product for all eternity. In the decision [1] they are a bit more specific:

"This shall be done in particular by ceasing to use that version of the tool Google Analytics as used on August 14, 2020, if not sufficient protective measures have been taken."

[1] https://www.imy.se/globalassets/dokument/beslut/2023/beslut-...

jhpacker · on July 4, 2023

Most alternatives are not made by advertising companies, but they also frequently aren't free... Rolling your own from the ground up is not necessary or typically advisable when there are so many good options, including many self-hosted and open source options if you're wanting that level of control.

I usually describe the cost of GA as "subsidized by your customers' data".