Event correlations are a typical one. Think about ad tech: you want every click event to be hydrated with information about the impression or query that led to it. Both of those are high-volume log streams.
You want to end up with the results of:
```
select * from clicks left join impressions on (clicks.impression_id=impressions.id)
```
but you want to see incremental results - for instance, because you want to feed the joined rows into a streaming aggregator to keep counts as up to date as possible.
I was definitely under the impression that ad impressions and clicks would be written to databases immediately and queried from there.
I'm still having a hard time imagining in what case you'd need a "live" aggregating display that needed to join data from multiple streams, rather than just accumulating from individual streams, but I guess I can imagine that there are circumstances where that would be desired.
Live-updated aggregates are quite common in this area. Consider metered billing ("discontinue this ad after it has been served/clicked/rendered X times"), reactive segmentation ("the owner of a store has decided to offer a discount to anyone that viewed but did not purchase products X, Y, and Z within a 10 minute period"), or intrusion detection ("if the same sequence of routes is accessed quickly in rapid succession across the webserver fleet, regardless of source IP or UA, send an alert").
In a very large number of cases, those streams of data are too large to query effectively (read: cheaply or with low enough latency to satisfy people interested in up-to-date results) at rest. With 100ks or millions of events/second, the "store then query" approach loses fidelity and affordability fast.
I think it can be challenging to get that much data to a single database. For example, you probably don't want to send every "someone moused over this ad" event in Japan to a datacenter in us-east-1. But if you do the aggregation and storage close to the user, you can emit summaries to that central server, backing some web page where you can see your "a 39-year-old white male moused over this ad" count go up in real time.
How important ads are is debatable, but if you're an ad company and this is what your customers want, it's an implementation that you might come up with because of the engineering practicality.
I have worked on systems that used Oracle Materialised Views for this. The aggregates get updated in realtime, and you don't need to run a heavy query every time.
You want to end up with the results of:
``` select * from clicks left join impressions on (clicks.impression_id=impressions.id) ```
but you want to see incremental results - for instance, because you want to feed the joined rows into a streaming aggregator to keep counts as up to date as possible.