Props to the author for ditching Google. I just want to point out that there are a LOT of non-Google analytics options out there; leaving Google Analytics really isn’t too hard if you’re willing to take the leap.
Plug: those who want a modern client-side analytics tool that’s free, self hosted, and open source might consider Shynet [0]. (Disclosure: I maintain it.) It’s a bit simpler/cleaner than Matomo, but exists in the same category.
Google Ads still dominates though, and if you're doing paid advertising - you're not just shooting yourself in the foot, but you're lopping off a limb - or two. I wish it wasn't the case.
Any non-Google analytics options integrate well with Google Ads, especially for retargeting?
For retargeting you would probably have to have the DoubleClick code on your site nonetheless. But at least serving European users you would only be able to activate it after a clear opt in via a consent manager.
Using marketing parameters (like utm-... in GA or Matomo) or the likes in any other tool still gives you clear marketing performance metrics in your analytics solution of choice, though.
And a lot of "pure" tracking tools would - at least currently - not fall under the opt in rules of thew GDPR. But the moment you link them with your advertising profiles (like in GA with the DoubleClick integration) your whole tracking setup also becomes opt in. So you would probably loose at least some share of your traffic in analytics and not be able to clearly do marketing analysis any more. At least not as exact as before.
I had clients loosing 80% of their traffic stats in analytics after shifting to opt in.
Take a look at Adobe's Experience Platform. Adobe Analytics is great. We take the experience events and profile events and build a unified profile upon which you can build segments used for targeting in Adobe's Advertising Cloud, Adobe Campaign, Adobe Target, and Adobe Experience Manager, and probably more...
The Advertising Cloud is integrated with many third-party advertisers.
I'm just a software engineer, though. We probably demand an extremely high price.
I feel like this is meant to be used for a website. Do you have an article about how to use it from an app? Maybe I'd have to make some urls the app hits to download 1 bit or something, if i want to track a certain action: app open, features used
The thing I'm interested in when using Google Analytics is tracking user path to see the bounce rate in a checkout process for example. You can calculate conversion rates for different user segments. People who just want to see "how many visits I got" don't benefit from GA. Developers often miss the point of GA, because they do not work in sales.
> The thing I'm interested in when using Google Analytics is tracking user path to see the bounce rate in a checkout process for example
There are lots of ways to get this data direct from the servers, and much, much more reliably and effectively.
> You can calculate conversion rates for different user segments.
Maybe. You only have data on those people who don't have adblockers that block GA. Which is a sizable segment of the internet. Also that they're not using a VPN, etc. Stats collected via GA are inherently less reliable than stats collected from your own site because GA is easily blocked.
Plus, there have been reports since forever that GA's stats are just not that reliable in the first place.
I've experienced this myself - I used to admin a WP blog, and the numbers from the site log and GA were around 20% different.
If you're relying on GA stats to calculate the results of any a/b testing, then you need to put in at least a +/- 25% error factor (i.e. if the A test converts 10% better than the B test, you have no idea whether that's a real thing, or a product of GA giving you inaccurate data - you'd need at least a 25% swing to begin to think it might be a real customer preference).
> Developers often miss the point of GA, because they do not work in sales.
Yes. But have you listened to their objections, rather than dismissing them as "not working in sales"? Developers do have some knowledge of this subject. And there are lots of ways of getting this data that doesn't increase page load and compromise security like GA does.
> Maybe. You only have data on those people who don't have adblockers that block GA. Which is a sizable segment of the internet. Also that they're not using a VPN, etc. Stats collected via GA are inherently less reliable than stats collected from your own site because GA is easily blocked.
You'd be surprised how many marketers don't understand this key point. Those that do just don't care. I'd argue it's part of why the internet is being dumbed down (because people without adblockers are the only ones with a voice).
Interesting point. "our audience is all older people with little education, we should cater to them" -> no, actually that's just the only bit of your audience who let you track them, and you're annoying the younger, tech-savvy people who'd like your product if you stopped being technically ignorant ;)
Bounce Rate in the checkout funnel? Do you mean the "Exit Rate" at specific steps in the funnel?
Just to clarify, because as far as I know from some years in the industry, in every tool I got to know a "Bounce" is defined as a single tracking hit from a user id without further measured activity.
Meaning after the first measured page view (the entry to the site) leaving the site couldn't be called a bounce.
None the less: If one is interested in a user privacy aware solution to replace GA with a focus on funnels, eCommerce, and such I would probably (up to a certain scale of traffic) recommend to take a look at Matomo (former Piwik). Can be run on your own server, has a lot of the basic functions of GA, a great API, can be used to do goal/path analysis as well as marketing performance reporting.
If a company/site reaches a certain scale I would probably recommend to use a paid solution like Adobe Analytics (or paid GA), after having done an evaluation into the real needs of said company/site.
I run a fairly simple website supported by donations and affiliate income. Even then, Analytics offers more than view counts. Here are a few questions it can answer for me:
- Is supporting IE11 financially justified?
- Which articles generate the most income, and why?
- Which components are useful, and which are just noise?
- Where are my visitors from? (it affects how I can help them)
> Which articles generate the most income, and why?
Your affiliate partner will tell you which links bring what income assuming you're using unique links.
Otherwise the server logs will give you a good page view count. Sure, it'll include bots, but bots that have no interest in the content would tend to be randomly distributed across all articles and thus not skew the results too much.
> Which components are useful, and which are just noise?
Not sure what you mean by components? If you're talking about different links or forms (like a newsletter) then the forms would also generate log entries on the server and can be counted there.
> Where are my visitors from?
Server logs and GeoIP.
I see nothing here that justifies ratting out your users to Google just to save a tiny bit of effort.
They can't tell me how much money those users actually bring me. 70% of my content is not monetised.
> Your affiliate partner will tell you which links bring what income
My affiliate partners go from Amazon to people who fax things around. Their data is not always accurate or complete, if they gather it at all. I used Google Analytics to spot many such problems in the past.
> the forms would also generate log entries
That would just confirm that a form was submitted. It won't tell me on which pages it performs poorly. It won't tell me that people opened a collapsible, but didn't like its contents.
> Server logs and GeoIP
The point isn't just to know where users come from, but how to better serve them. Server logs won't answer questions like "are there enough non-EU, Spanish-speaking visitors to warrant translating this page?"
> I see nothing here
It might have something to do with myopia. Your judgements are based on inaccurate preconceptions of how websites operate.
> That would just confirm that a form was submitted. It won't tell me on which pages it performs poorly
The browser sends you a referer header. You need to do some work with data analysis/transformation to derive insight from this (by comparing the number of visits on the parent page vs the number of form submission where the referer is that parent page) but it is absolutely doable. If you don't want to do something like this I'm sure self-hosted analytics tools like Matomo (formerly Piwik) can do this out of the box.
> are there enough non-EU, Spanish-speaking visitors
Maybe some people have reasons to not want you to know that they are non-EU and Spanish speaking, and this is something new privacy regulations are attempting to address.
> Your judgements are based on inaccurate preconceptions of how websites operate.
My judgements are based on not wanting to be stalked on the web (especially by a third-party adtech company like Google), and seems like at least in Europe the law agrees with me. Whether this makes your business unprofitable is not my concern (if your business model becomes unprofitable by a reasonable request like not stalking users without their consent then I'd argue this was never a good business model to begin with).
On that note, I guess let's agree to disagree and let the privacy regulators decide for sure (given their current track record I wouldn't worry for another few years).
Easy: use a 1st party self-hosted tracker (matomo/piwik), anomnize last digits of IP, respect DNT and provide an optout (matomo provides an embeddable widget) on your privacy policy page.
And bam, no legal need for cookie consent OR notification! No popup at all!
And you still get perfectly usable statistics for most applications.
One possible benefit that I don't see discussed much in this context is bypassing ad blockers. I run a tech focused website and Google analytics registers about 10k visits/month. I figure that a good chunk of my visitors have ublock and so they don't show up. Presumably, alternative analytics or self hosted analytics are not blocked, so I'd get more accurate stats. Is this a correct assumption?
As I also said in another comment, you are correct in assuming, that you are bypassing ad blockers with this solution.
This solution takes your server log files and does an analysis/reporting based on that. As these log are written, when the browser accesses the resources (sites) you get clear stats oof how often your resources (sites/files) got requested.
As others stated nobody will filter out bots for you. So this will inflate the numbers with traffic from "none users". But also "adblock users" will show up.
As always - web stats are but an approximation of the real world. Their analysis depends on a lot of factors. From experience with different clients I know of cases were AdBlockers blocked up to 25% of traffic from showing up in analytics on sites with a more tech savvy audience.
It’s true, but GA also does a pretty good job at filtering out bot traffic for example, so it depends on what your definition of more accurate is. There are also ways to send GA hits through a sub domain to make it look self hosted and bypass ad blockers.
> It’s true, but GA also does a pretty good job at filtering out bot traffic for example
Surely you jest. I had to quit using GA because the tech behemoth with all it's tens of thousands of software and algo experts couldn't figure out how to filter out referrer spam from their analytics.
Such a simple problem you'd think they'd solve it in a day. I gave them 2yrs and they still couldn't solve that problem so I left.
The strange thing for me is, when I switched to custom first-party analytics, I stopped getting referrer spam altogether. I assume then that spammers explicitly optimise for GA tracking and ignore everything else. Which makes sense since a lot of them are targeting audiences that care about SEO and GA has a uniform tracking URL that they can flood without the cost of rendering webpages. The reason GA can't filter them out is because they're constantly working around each other.
Bypassing ad-blockers is simple enough, just roll your own custom domain. I'm working on a blog post that covers how we (Fathom) did this with Caddy as a multi-region reverse proxy
Thanks for the mention of Simple Analytics [1] Bartek. We do love to be a paid service while we know it doesn’t suit everybody’s need. This way we don’t need to find any other way of making money (with the data of our customers).
I noticed this post mentions GoatCounter but says that it doesn't meet his needs because it isn't self-hosted.
It is not by default, but you certainly can self host it, and the author has been quite open about that being a viable path for people interested in doing so. I self host GoatCounter myself and it works very well.
+1 to Goatcounter. Not affiliated with it, just an happy (free) user on a low traffic/hobby project. Simple and no BS. More that enough if you just want to know the big picture and care about privacy. The data I get is this (my data is public but it’s up to you): https://slowernews.goatcounter.com/
Hey, yeah. This is my bad! I didn’t dig deep into GoatCounter but on second glance I can see the link to source. I amended my article to clarify what I know (not much)
Nice. Can it be used to compare this year's first quarter mobile usage for dutch speaking visitors who came from Facebook to last year's same profiles but who came from the newsletter ?
GoatCounter and others use non-identifiable hashes to track a unique visit, but they only retain that hash for some time[1]. I think in your example, you'd have to use a solution that uniquely identifies a session and indeed, keeps track of it.
I'm not sure that's even necessary for the parent's requirement? You should be able to get the parent's data out of it since the location and campaign are stored; there just isn't a UI for it.
Exactly, the marketing/communication/sales people need those bits in a UI, they need tools to analyze those bits easily, mark them for specific analyzes and to share reports.
Sure, and there's definitely value in that. I can't speak for other products, but GoatCounter intentionally doesn't try to solve every possible analytics use case.
Adding advanced features frequently comes with the trade-off that it makes things harder for people who don't use those features, so by limiting the scope it gets easier/better for some, and worse/harder for others. I found this is the case for software in general.
There certainly should be an UI to quickly compare how the current quarter is doing compared to the last quarter (I wrote some code for that last week, but not done yet), but doing stuff like adding extra search parameters like "from mobile" and "from Netherlands" seem awfully specific to me. While certainly useful for some, most people are not hard-core marketers and adding stuff like that is just "noise" for a lot of people which makes stuff harder to use for them.
Matomo can do all of that, and I also found it hard to use.
Or, tl;dr: Matomo is already good at being Matomo, let's do something different :-)
I'm not associated with them at all, but Fathom are worth looking into if you want an analytics platform that respects user privacy: https://usefathom.com/
Without launching into moral judgement, I’d just like to mention that people who block your analytics code from loading normally are often sending you a pretty strong signal that they would not like to be tracked. Do you have something that might take this into account?
I am also working on a similar product to Matomo, it's not free but it also provides some of the Matomo's premium features for a much cheaper price: https://usertrack.net
I like this blog post and I support anyone who removes a third party tracker from their site.
There are companies that live and die by analytics and demographics but your personal blog doesn't need the information that GAnalytics sucks up.
GoAccess (suggested in a article) is a fine choice, although I found it did not do a good job of filtering out bot hits. For most people this might not be a big deal but it annoyed me.
In the end I just wrote[0] a simple hit counter that triggers off a js beacon.
I am also working on a similar product to Matomo, it also provides some of the Matomo's premium features for a much cheaper price: https://usertrack.net
GoAccess is cool but it won't help you much if you have a static website or if you serve the majority of requests from a CDN. In those cases, Matomo (previously piwik) is a good solution for client-side JS-based analytics similar to Google Analytics.
I’m not familiar with GoAccess or Matomo but after some Googling it looks like Matomo requires self-hosting it’s PHP/MySQL server.
If you are using a CDN then I’m assuming that multiple access.log files need to be aggregated at some point unless the CDN has a service that automates the process. Logstash and the full ELK Stack (or alternative) seem to be required when multiple heterogeneous servers are involved in serving content. Browser-side JavaScript analytics seems to avoid the DevOps surrounding the ELK Stack. GoAccess seems like a minimalist solution when you control a single httpd-style server and can run a local daemon to process the access.log file.
As a side note, it looks like you need an Enterprise account to use Cloudflare’s Logpush Service or Logpull API. Amazon’s CloudFront has an advantage here.
Avoiding Google Analytics is non-trivial for the lazy and/or price conscious.
How does Matomo work better with a static site, than GoAccess? GoAccess reads the server logs and creates metrics. Matomo requires javascript; that isn't 'static website'.
GoAccess reads metrics of a server you have access to. So if you are distributing from a CDN or for instance GitHub or some other hosted solution, then you have no logs to read from.
Also usage of javascript doesn't make one page non-static. Static is usually referred to a page that is rendered in full before a request comes in so the content does really change based on the request. More simply, content doesn't change dynamically based on the request. Having javascript to generate metrics doesn't make the page dynamic.
I use it in my blog, but also believe, that the numbers are completely inflated. I don't trust them. This has been discussed on Github a few times [1], so don't expect accurate numbers (yet).
It's also hard to see what's going on recently on you server, because you only get totals. I'd love if I could change the time interval of the shown html stats.
I like the way GoAccess is going though and I hope it will improve.
Interesting, I've switched from Google Analytics to GoAccess ~5 years ago. I let both analytics run for a month and compared the results. The relative numbers were very similar (so I get the same information about what blog posts are most popular), but the absolute numbers were in fact lower for GoAccess. It might be because tech blog visitors are using AdBlockers more often (and hence block GA).
> I'd love if I could change the time interval of the shown html stats.
GoAccess displays the data that you pass it. So while it doesn't have any date filter option (at least the last time I've checked), you can just filter your logs beforehand. There's even a more simple solution that I'm using: Set the logrotate to a specific time frame (e.g. weekly), so you can pass "access.log" to GoAccess to only get the latest stats. You can still pass "access.log*" to get ALL stats at the same time.
About the numbers: thanks for your input. I guess, you get more accurate results the more visitors you have. On my small sized blog I doubt the numbers and think, that GoAccess does not filter out some bots. You can try to determine them and filter them out, but well, that takes some time.
However, even if the numbers may not be accurate, you still see overall trends, which is valuable.
And thanks for the log-rotation trick. I will definitely make use of it.
Most bot filtering from analytics tools (GA or Adobe or so) are quite efficient. So you would expect lower traffic in these tools as from a tool using your server's log files.
On the other hand a lot of browser plugins or privacy/incognito mode kill analytics and do not have any effect on your log files. This would lead to higher numbers in your log files as well.
So I would expect somewhere between 10% - 25% increased numbers from your log files depending on the audience you serve and the overall traffic volume your site has.
At least this were the numbers some years back, when we did some additional backend tracking for some clients, were we linked the front end tracking tool ID (from the cookie) to the tracking hit being sent from the backend with additional information. Back in 2015 it was between 7% and 19%, in 2018 (before GDPR kicked in) it was 'tween 15% and 27% of backend tracking hits that did not have a frontend ID associated with them. So we knew the amount of tracking calls that had FE tracking blocked.
Very interesting. I just checked my logfiles and as expected most traffic seems to originate from search crawlers, feeds and bots running through all kinds of exploit urls. Hard to tell, but a wild guess is that more than 75% of hits and visitors are bot-related.
I learned that it is possible to exclude bots through browsers.list [1] and in goaccess.conf you can exlude ip ranges. Unfortunately updating those entries is very time consuming and probably not worth it.
I worked on a library to do this for my own analytics thing[1]; adding a CLI so you can filter bots from logfiles isn't too hard: just need to parse the log lines to a http.Request{} with the correct fields filled in (mostly User-Agent, but also looks at IP address).
One of the signals is from various JS properties though (navigator.webdriver, window._phantom), so you'll miss out on that. I have some other ideas as well.
I’m not associated with Countly ( https://count.ly/ ), but I heard good things from my friends using it. It’s open source and makes money with enterprise edition.
Has anyone had the experience of seeing over-reporting of some metrics in Google Analytics as compared to an internal tracking system? Is this data generally seen to be 100% reliable?
Nice work! Every time I see the "de-Google-ing" posts, there are people in comments saying but can it do this or can it integrate with that Google product.
I'm working on a Google Analytics alternative myself [1] and we make it clear that it is not meant as a clone or a full blown replacement of Google Analytics.
Some people are fine running GA and are happy to integrate with Google Ads and the rest of the Google ecosystem.
On the other hand, some would prefer to focus more on privacy of their visitors or on not having to get cookie / GDPR consent or on having a faster loading website or support a more independent web etc. And alternative solutions to Google products such as these are more meant for those use cases.
Very cool project! It is good that more people are trying to go Google-free. Your project does server side analytics, though, while Google Analytics is client side. Both different advantages and disadvantages.
Server side analytics advantages:
- No overhead on your website, because no external resources are loaded
- Not affected by content blockers
- Can track resources other than web pages, like images or redirects
- Show resources that could not be found
Client side analytics advantages:
- Can capture more data about the user, like device size, browser and OS
- Better track user behavior, like time spent on each page
- Provide real-time feedback about visitors
- Usually easier to setup (3/8)
I've been playing around with different analytics options on my blog recently - Netlify on the server and Plausible Analytics on the client side. The results from 7 days comparison turned to be quite different [0].
The server side tool shows much more views across all pages in my blog. I think this is because the pages are accessed many times from automated systems that don't run JavaScript and are therefore not counted on the client side. This happens when you post a link on Twitter or WhatsApp for example, because their websites or clients then make a request to fetch the metadata of the page and display a card preview. Therefore, it counts on the server, but not on the client side. A perfect example is the draft article, which I only sent per WhatsApp to my wife for proof reading. The difference there is just 2 views (I guess one preview each on my and on her phone). I sent all other links to several people and posted them on Twitter and Reddit.
You can also somewhat see this effect in the referrers [1]. The server side tool counted many hits from a RSS reader and Google Translate (my parents in law reading my blog in Bulgarian :D). The preview requests from Twitter and WhatsApp are probably counted as direct traffic.
I think client side analytics more realistically count the number of people reading my blog (even if not many). However, there are cases that can be captured only on the server side (for example Google Translate). So, the best way would be to use both or maybe even make a service that combines client and server side - I think there is a lot of potential there!
There's one more advantage to client side analytics, though maybe you meant to cover that under "Usually easier to setup": it's the only option if you host your site on a platform like Github pages. The problem is, you don't have access to server-side logs on these platforms.
Yes, but the data is used by you and only you. It's not shared with Google. It's not merged with data from other sites and used against your users. The collected data is under your control, and it's used solely for its intended purpose.
google is only interested in selling you ads. They don't care whether you actually get anything out of their analytics product. If it's free, you ARE the product.
Can we not have this mindkill of a line posted in every frigging thread on HN about Google or FB?
Google IS interested in making Google Analytics better, because that ensures that more people use it on their websites, because then, with your cynical framing, Google can use it to sell you better ads.
It's bizarre how people here pretend like Google is simultaneously zealous about selling you ads but also stupid enough to not realize what parts of the Google eco-system synergize with the whole machinery.
I submitted a feature suggestion years ago to add an API for annotations which by now has been voted up a million times and they still ignore it. Really, they don't care about making products better.
Correction: If it's free, your users are the product.
And an addition: If you're enterprise and need Google 360, which is very much not free ($100k+/year I believe), not only are you paying - but your customers are still the product!
it is not, it only gives you the data in a way that is convenient for them to sell you ads. it does not give you the data in a way that you want or is best for your business.
Plug: those who want a modern client-side analytics tool that’s free, self hosted, and open source might consider Shynet [0]. (Disclosure: I maintain it.) It’s a bit simpler/cleaner than Matomo, but exists in the same category.
[0] https://github.com/milesmcc/shynet