We found that as we scaled it up, we couldn't really keep the data in raw form, so we had to build rollup documents that cover 5-minute and 1-day buckets. Do you use the same trick, or is the number of pageview events for you manageable enough that you just keep it all raw?
We haven't reached that stage yet, fortunately; though at some point someone will want to do a big multi-year aggregation report across all indexes and still expect it to take not more than a few seconds.
My ideal solution would be one that rotated the dataset into historical rollups on a daily basis, so that we only stored the raw data for today, and gradually merged earlier entries at lower granularities. However, I haven't thought much about how to do that with Elasticsearch. I can see a way of doing it by embedding the value in the field label, and using the field value as a count, but Elasticsearch really doesn't like lots of unique fields; you shouldn't be using more than a few hundred at most in a single installation (across all indexes).
Apologies for the email gateway for the video, but you can also see my slides here: https://www.elastic.co/elasticon/conf/2016/sf/web-content-an...
We found that as we scaled it up, we couldn't really keep the data in raw form, so we had to build rollup documents that cover 5-minute and 1-day buckets. Do you use the same trick, or is the number of pageview events for you manageable enough that you just keep it all raw?