Well, it's a tall order, but I think your best bet is to build something with elastic stuff: beats, logstash, elasticsearch, kibana. We store logs for long term in Ceph (we contributed a RADOS plugin so logstash can output directly into Ceph).
So, new stuff goes to elastic, gets indexed, you can look at it via Kibana, or build custom dashboards, even directly from the logstash firehose.
As logs get older, you can delete whole daily indexes from ES, and if you want to investigate/datamine/aggregate something, you can still grep the archived logs.
The bottleneck will be probably Kibana (or the admin/operator looking at the end result), as all the other components can be scaled (beats are already per-node, logstash is stateless, so just run more of them behind a round-robin DNS name and beats will pick one up - or of course you can use HAproxy to load-balance, and the elasticsearch cluster can be rather large too).
So, new stuff goes to elastic, gets indexed, you can look at it via Kibana, or build custom dashboards, even directly from the logstash firehose.
As logs get older, you can delete whole daily indexes from ES, and if you want to investigate/datamine/aggregate something, you can still grep the archived logs.
The bottleneck will be probably Kibana (or the admin/operator looking at the end result), as all the other components can be scaled (beats are already per-node, logstash is stateless, so just run more of them behind a round-robin DNS name and beats will pick one up - or of course you can use HAproxy to load-balance, and the elasticsearch cluster can be rather large too).