Properly configured big.LITTLE clusters should be set up so that all CPUs report...

pm215 · on Sept 12, 2016

Also, if I'm reading the proposed fix in the mono pull request correctly, it doesn't deal with the problem entirely because there's a race condition where the code might start execution on the core with the larger cache line size, and then get context-switched to the core with the smaller cache line size midway through executing its cache-maintenance loop. The chances of things going wrong are much smaller, but they're still there...

(Edit: rereading the blog post, they say they need to figure out the global minimum, but I can't see how their code actually does that, since there's nothing that guarantees that the icache flush code gets run on every cpu before it's needed in anger.)

wyldfire · on Sept 12, 2016

It should converge to the right value eventually but does seem like there's definitely a chance for it to be wrong one or more times before running on the little cores.

hrydgard · on Sept 12, 2016

It's fine if core migration happens during the invalidation loop - the core migration itself surely must wipe the non-shared cache levels thoroughly, otherwise nothing would work.

EDIT: Actually if the big and little cores are used together, and not exclusively, then this might still be an issue, yeah.

pm215 · on Sept 12, 2016

No, in general Linux migrating processes between cores won't nuke the caches. The hardware's cache coherency protocols between CPUs in the cluster ensures that they are all in sync sufficiently that it's not needed.

hrydgard · on Sept 12, 2016

I understood that the configurations currently in use usually only power up either the big or little cores at the same time, and that kind of migration has to wipe the caches, right? But that might be inaccurate, and you are of course right in the general case.

pm215 · on Sept 12, 2016

The state of the art in Linux scheduler handling of big.LITTLE hardware has moved through several different models, getting steadily better at getting best performance from the hardware (wikipedia has a good brief rundown: https://en.wikipedia.org/wiki/ARM_big.LITTLE). You're thinking about the in-kernel-scheduler approach, but global task scheduling (where you just tell the scheduler about all the cores and let it move processes around to suit) has been the recommended approach for a few years now I think.

rodrigokumpera · on Sept 12, 2016

Yes and no.

Core migration don't need to reach a global synchronization point, just enough so that the 2 cores in question agree with each other. This can be done without requiring global visibility of all operations of the source core.

Senji · on Sept 13, 2016

Spin up N threads with single core affinities where N = total cores.

If anything, the OS should have an API to tell you this info in advance.

pm215 · on Sept 13, 2016

This won't work in the presence of CPU hotplug (which Android uses for power management), because some of the CPUs might not be online when you do it.

The API for "tell me this info" is "read the CTR_EL0 register"; it's a hardware bug that it doesn't do the right thing on this particular chip.

rayiner · on Sept 12, 2016

That probably preserves correctness, but also invalidates any attempt to rely on the cache line size to lay out data structures to avoid cache line ping-ponging. Which is a major reason a program would care about the cache line size to begin with.

I think the really correct answer might be to abandon the idea that the system has a single cache line size...

mayank10j · on Sept 12, 2016

what's the incentive for cpu manufacturer to make effort of building extra cache memory in hardware for bigger cpu in ARM64, if there is no sane way to use it ?

pm215 · on Sept 12, 2016

This setting is the cache line size, not the total cache size. You can happily give the bigger cpu more total cache.

colejohnson66 · on Sept 13, 2016

The difference between cache line size and cache size is like paper. You can make it wider (bigger cache line size), taller (bigger cache size), or both.

The problem is like printing. If you put in an A4 or letter (ANSI A) sized sheet and tell your printer it's A3 or tabloid (ANSI B), you're gonna have problems.