Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Once upon a time we did analytics and error analysis by running shell scripts executing awk, sed and grep over a apache or nginx access log or error log.

What I am trying to say is that you can still do analytics, even pretty advanced stuff with some more elaborate scripting, if you want. The only thing you need is the access log.

Something which has been largely forgotten ever since tools like Urchin became a thing :)



Except if any of your pages are cached between eyeball and your server and so your server logs don't capture everything that is going on. You can get fancy with web server logs, but depending on what you're trying to understand it may not be the data you need.

<source: did fancy things with logs over the last 25 years, including running multiple tools on the same site in parallel to do comparisons (Analog, AWStats Urchin, GA, Omniture, homegrown, etc...)>


If you control the cache layer, log it there. If you don't control the cache layer, does a read from the end user cache really count as a separate visit anyway?


There are plenty of situations where someone visiting a page once and someone repeatedly looking at that page over a period of days (even if it is pulled from their browser cache) is an important difference. Obviously it depends on what you're using the data to try to understand.


This is how you end up with no-cache assets on pages so they can keep track of actual traffic.


One of the greatest jobs I ever had from a technical perspective had terabytes of structured access logs hosted on prem inside of a VPN, with a few small bespoke tools to search through them (and many more pages of commands for common tasks not yet implemented in a UI).

Not a single line of tracking or analytics on the front end, we just tracked everything we cared about at the server level.


And most likely a compliance and legal nightmare waiting to drop on a DPO one day.


That place didn't have any European operations so no GDPR concerns¹, but for what its worth it was completely.. pseudonymous I think is the term we want? You couldn't link a server entry to an actual user account by any means² but you could group distinct server calls together as coming from the same person. These weren't "server logs" in the same of IPs or user agents or that kind of thing. More like application logs w/ scrubbed/obfuscated user data just stored in gigantic text files.

¹ To those who would say it doesn't matter, I'd say that laws aren't laws if they can't be enforced and there's no enforcement mechanism for some EU bureaucrat to fine a company with no operations outside of the US.

² I'm sure the technical means existed to do it especially if you already had access to the logs but the point is we weren't explicitly storing any PII or data that was linked to a real account. Just actions throughout the apps.


However, if you do this, you will still need to comply with all relevant privacy laws.

For example, in the EU, you need user consent to use server logs that include IP addresses for analytics. You also need to provide post-consent opt-outs and privacy statements and audit logs and all off a sudden you're building another analytics tool.


How exactly does that work? You need consent for server logs? Am I able to run fail2ban without consent?


In the EU, IP addresses are personal data and you need a legal basis for each form of processing. You could make an argument that Fail2Ban falls under legitimate interest, but there is now precedent that analytics must have user consent and another legal basis will not be accepted.


No, logs don't require consent in that case, see recital 49.


> Urchin

Urchin was acquired by Google and was ultimately sunset in favor of Google Analytics. It supported local and hybrid analytics models, the later arguably evolved into Google Analytics.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: