Properly configured big.LITTLE clusters should be set up so that all CPUs report the same cache line size (which might be smaller than the true cache line size for some of the CPUs), to avoid exactly this kind of problem. The libgcc code assumes the hardware is correctly put together.
There is a Linux kernel patchset currently going through review which provides a workaround for this kind of erratum by trapping the CTR_EL0 accesses to the kernel so they can be emulated with the safe correct value:
http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg...
and it seems to me that that's really the right way to deal with this.
Also, if I'm reading the proposed fix in the mono pull request correctly, it doesn't deal with the problem entirely because there's a race condition where the code might start execution on the core with the larger cache line size, and then get context-switched to the core with the smaller cache line size midway through executing its cache-maintenance loop. The chances of things going wrong are much smaller, but they're still there...
(Edit: rereading the blog post, they say they need to figure out the global minimum, but I can't see how their code actually does that, since there's nothing that guarantees that the icache flush code gets run on every cpu before it's needed in anger.)
It should converge to the right value eventually but does seem like there's definitely a chance for it to be wrong one or more times before running on the little cores.
It's fine if core migration happens during the invalidation loop - the core migration itself surely must wipe the non-shared cache levels thoroughly, otherwise nothing would work.
EDIT: Actually if the big and little cores are used together, and not exclusively, then this might still be an issue, yeah.
No, in general Linux migrating processes between cores won't nuke the caches. The hardware's cache coherency protocols between CPUs in the cluster ensures that they are all in sync sufficiently that it's not needed.
I understood that the configurations currently in use usually only power up either the big or little cores at the same time, and that kind of migration has to wipe the caches, right? But that might be inaccurate, and you are of course right in the general case.
The state of the art in Linux scheduler handling of big.LITTLE hardware has moved through several different models, getting steadily better at getting best performance from the hardware (wikipedia has a good brief rundown: https://en.wikipedia.org/wiki/ARM_big.LITTLE). You're thinking about the in-kernel-scheduler approach, but global task scheduling (where you just tell the scheduler about all the cores and let it move processes around to suit) has been the recommended approach for a few years now I think.
Core migration don't need to reach a global synchronization point, just enough so that the 2 cores in question agree with each other. This can be done without requiring global visibility of all operations of the source core.
That probably preserves correctness, but also invalidates any attempt to rely on the cache line size to lay out data structures to avoid cache line ping-ponging. Which is a major reason a program would care about the cache line size to begin with.
I think the really correct answer might be to abandon the idea that the system has a single cache line size...
what's the incentive for cpu manufacturer to make effort of building extra cache memory in hardware for bigger cpu in ARM64, if there is no sane way to use it ?
The difference between cache line size and cache size is like paper. You can make it wider (bigger cache line size), taller (bigger cache size), or both.
The problem is like printing. If you put in an A4 or letter (ANSI A) sized sheet and tell your printer it's A3 or tabloid (ANSI B), you're gonna have problems.
There is a Linux kernel patchset currently going through review which provides a workaround for this kind of erratum by trapping the CTR_EL0 accesses to the kernel so they can be emulated with the safe correct value: http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg... and it seems to me that that's really the right way to deal with this.