Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Quite agree, this is how I explain it to people. When you think of cache as another derived dataset then you start to realize that the issues caches bring to architectures are often the result of not having an agreement between the business and engineering on acceptable data consistency tolerances. For example, outside the world of caching, if you email users a report, and the data is embedded in the email, then you are accepting that the user will see a snapshot of data at a particular time. In many cases this is fine, even preferred. Sometimes not, and instead you link the user to a realtime dashboard instead.

Pretty much every view the user sees of data should include an understanding as to how consistent that data is with the source of truth. Issues with caching (besides basic bugs) often come up when a performance issue comes up and people slap in a cache without renegotiating how the end user would expect the data to look relative to its upstream state.



The cache is an incomplete dataset by definition. It’s not a data set, it’s a cache of a data set. You can never ensure you get a clean read of the system state from the cache because it’s never in sync and has gaps.


What about materialized views? CPU cache? Only the Sith deal in absolutes :)


CPU cache means that the same value read twice will return the same value. Some exceptions for NUMA, and mu[tiple threads. But two reads of a cache cache make no such guarantees.

There is a vast number of undiagnosed race conditions in modern code cause by cache eviction in the middle of 'transactions' under high system load.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: