No, the physical addresses used by the kernel and other devices are scrambled before they're used to access DRAM on modern hardware. This is not visible to the kernel.
It //STILL// sounds like something that happens in the MMU, at the time that the virtual address is mapped back to physical addresses via the page tables; which means that the kernel still knows the real backing addresses.
dmidecode seems to indicate that the description from the actual RAM is lacking transparency about it's internal geometry.
I can get the ranking for slot-level interleaving from dmidecode on my systems (which means a kernel could or already has it as well).
Thinking about the inside-chip geometry issue as well as the current on-list proposal in the news item I've reached a different potential solution.
If page faults are tracked /by process/, the top N faulting processes could be put in to a mode where the following happens on a page fault:
* Some semi-random process (theoretically not discover-able or predictable by user processes) picks a number between 1 and some limit (say 16).
* The faulting page, and the next N pages, are read in (which should pull them in to the cache).
This would help by making it harder to mount a /successive/ attack on different memory locations. Processes chewing through large arrays of objects in memory legitimately shouldn't be impacted that much by such a solution; surely less so than a hard pause on the process.
Are you potentially confusing a page fault which results in the page being pulled into main memory with a cache miss which pulls a line from memory into on-CPU memory? My understanding is that the cache miss is managed by the MMU without much kernel involvement and we're relying on non task aware PMU statistics local to a single CPU for the proposed mitigation.
Am I missing some aspect of page mapping/cache management?
Calling that "scrambling" is pretty unhelpful, though. It's interleaving accesses across the whole collection of memory devices in a fairly regular pattern, but the mapping isn't trivial because the different levels of parallelism have different latency overheads.