Hacker News new | past | comments | ask | show | jobs | submit login

Do you really need to debug events from 3 and half years ago? Full logs only really need to stick around as long as you're likely to want to debug them. Log rotation is a must (I've seen debug logs nobody reads sitting in the gigabytes ...) Past that, you can cherry pick and store metadata about the events (e.g. X hits from userAgent Y on this day) with enough information you'll need to do trend analysis, although it's generally a good idea to keep backups of old full logs in case you need to reload the logs to find out that one thing you forgot to add to your metadata ... If you do genuinely need all of the data back that far, you should look at partitioning the data so you're not indexing over millions of rows - how you do that depends how you're intending on using the data.



If the table is too large, make it smaller by moving some of the data to a colder, slower, store. Whether that's json documents or text file gzips on 6x USB drives, data retention is mostly cheap.

The question is, what information do you want available within the second about logs from a year ago? Aggregate that, and move the rest.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: