Filter and aggregate after you log your metrics, traces, log messages, etc.; not before.
You can worry about data retention, rollups, and other strategies for limiting data storage separately from the systems that emit the data.
At least with the right data stores. I kind of like what opensearch and elasticsearch do for this. In Elasticsearch you have a data stream. You configure it to roll over based on time or data size. Once rolled over, indices are read only; new data appends to the current one. You then can define life cycle policies to decide what to do with the old ones and e.g. move them to cold storage, transform them with rollups, and eventually delete them.
With application logging, you typically assign different log levels. Trace and debug are typically disabled in production (or should be). Info can be quite noisy. Warn tends to be repetitive (because developers tend to ignore warnings and will never fix them). Errors should be rare.
I have my system configured to start emailing me if errors get logged. An error means something is broken and needs to be fixed. Zero tolerance on errors. When an error happens, all the other log information provides me context. So there's value in retaining that. But only for a few days at best. Long enough to survive a weekend or things like Christmas. But after that it's just noise. I have a hard cut at about two weeks. Some places you need to store stuff longer for ass coverage reasons.
Data retention comes at a price of course. I've seen companies log ginormous amounts of data and ignoring all their errors. 30GB per day. Absolutely appalling. Me: it looks like your database layer is erroring non stop (constraint violations and worse); you might want to do something about that. Them, ah no that's just normal we just ignore it (php shop, incompetence was the norm). Me: so how do you know when something breaks?! Them: ......?!
My well paid consulting gig was beating some sense into this operation as one of the managers noticed they were spending hundreds of thousands per year on this nonsense. My fee was a rounding error on that. Easiest job ever. But kind of cringe worthy once I started looking into what they were actually doing and why. Mostly it's just, "yeah some guy set that up once and then we never looked at it and he left. What are you going to do?!". There was a lot of that with this company. Just absolutely nobody that even cared about the waste of resources or getting any meaningful feedback from their logging. If that's your team, you need to do something about it. That's your job and your not doing it well. If you need an external consultant to tell you, you might want to reflect on the notion of majorly shaking things up a bit.
You can worry about data retention, rollups, and other strategies for limiting data storage separately from the systems that emit the data.
At least with the right data stores. I kind of like what opensearch and elasticsearch do for this. In Elasticsearch you have a data stream. You configure it to roll over based on time or data size. Once rolled over, indices are read only; new data appends to the current one. You then can define life cycle policies to decide what to do with the old ones and e.g. move them to cold storage, transform them with rollups, and eventually delete them.
With application logging, you typically assign different log levels. Trace and debug are typically disabled in production (or should be). Info can be quite noisy. Warn tends to be repetitive (because developers tend to ignore warnings and will never fix them). Errors should be rare.
I have my system configured to start emailing me if errors get logged. An error means something is broken and needs to be fixed. Zero tolerance on errors. When an error happens, all the other log information provides me context. So there's value in retaining that. But only for a few days at best. Long enough to survive a weekend or things like Christmas. But after that it's just noise. I have a hard cut at about two weeks. Some places you need to store stuff longer for ass coverage reasons.
Data retention comes at a price of course. I've seen companies log ginormous amounts of data and ignoring all their errors. 30GB per day. Absolutely appalling. Me: it looks like your database layer is erroring non stop (constraint violations and worse); you might want to do something about that. Them, ah no that's just normal we just ignore it (php shop, incompetence was the norm). Me: so how do you know when something breaks?! Them: ......?!
My well paid consulting gig was beating some sense into this operation as one of the managers noticed they were spending hundreds of thousands per year on this nonsense. My fee was a rounding error on that. Easiest job ever. But kind of cringe worthy once I started looking into what they were actually doing and why. Mostly it's just, "yeah some guy set that up once and then we never looked at it and he left. What are you going to do?!". There was a lot of that with this company. Just absolutely nobody that even cared about the waste of resources or getting any meaningful feedback from their logging. If that's your team, you need to do something about it. That's your job and your not doing it well. If you need an external consultant to tell you, you might want to reflect on the notion of majorly shaking things up a bit.