He was probably one of the biggest users that day, so that makes sense. The 2,40...

NicoJuicy · on Aug 19, 2021

I don't think I agree. Cache has a cost too.

In theory, you'd want to cache more popular pages and let the rarely visited ones go through the uncached flow.

Crawling isn't user-behavior, so the odds are that a large percentage of the crawled pages were not cached.

cldellow · on Aug 19, 2021

That's true. On the other hand, pages with infoboxes are likely well-linked and will end up in the cache either due to legitimate popularity or due to crawler visits.

Checking a random sample of 50 pages from this guy's dataset, 70% of them were cached.

bawolff · on Aug 19, 2021

Note - there's several levels of caching at wikipedia. Even if those pages aren't in cdn (varnish) cache, they may be in parser cache (an application level cache of most of the page).

This amount of activity really isn't something to worry about, especially when taking the fast path of logged out user viewing a likely to be cached page.