You also need invalidation policies if the number of items you cache might end up too large in practice. That has nothing to do with referential transparency.
tbh i never ran into such an issue with my applications and it sounds to me like a physical constraint: i don't expect to be able to keep something cached in memory that is larger than the available memory. if i would see myself optimizing for the best "maxsize" property (assuming lru_cache), then i would rather change to a probabilistic data structure like a bloom filter or cuckoo filter (or whatever the contemporary approach there is).
> can you elaborate the problem a bit more please?
Simplest example: take any function you memoized with @functools.memoize. The more it's called, the the bigger the cache can grow. Run it on a stream of inputs and you're practically guaranteed to run out of RAM.
This obviously has nothing to do with referential transparency.
memoization is a means of simple optimization to speed up a program, just like GP wrote. you exchange the cost of executing a function with the cost of memory utilization.
say you memoize a function that is referentially transparent, needs 10secs to finish its execution and return its result (eg. factorial). then the memoization would just cache the arguments plus the return value to reduce the 10secs into a simple comparison. the worst that could happen is a missed hit which would invoke the expensive function again - just like what would happen without the optimization. that isn't a big concern, as the function is referentially transparent (without side-effects). memoization lives in this perfect mathematical world where memory for maxsize entries will always be available.
but if your expensive function is referentially opaque, but still pure, then you can't only depend on the arguments of the function, but you need to take more context into account and the additional context requires a form of cache invalidation. the context can be counter-based (only re-evaluate every nth call), time-based (re-evaluate at most once every t time, throttling or debouncing), or both (only allow for at most n/t re-evaluations over time, rate-limiting) etc. in my book that isn't memoization anymore, it's just caching. and you need to define a policy that defines its invalidation.
if your stream of inputs exceeds the available memory, it lives outside the perfect mathematical world and i'd guess you have other problems than a speed optimization through memoization. you'd then need to consider space-efficiency and i would argue that memoization is the wrong kind of optimization for that problem.