Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Of course, this simple function cannot be a replacement for proper analysis, but it seems that x86 code is not significantly denser.

Look at the demoscene for an example of dense x86 code, especially in the sub-1K categories. They routinely achieve code densities for x86 that no compilers I know of can get close to, and AFAIK I have not seen the same happen with ARM, nor MIPS or any other well-known RISC.

3-address instructions improve code density only in situations where both source operands are needed later.



They use a lot of the original 8087 FP instructions (very dense because it's basically a bytecode stack machine). Plus tricks like deriving constants from bytes in the code segment. And you can assume the contents of registers when you enter the code.

Pyrit, a ray tracing demo in 256 bytes, does all of that: https://www.pouet.net/prod.php?which=78045

You probably wouldn't want your general purpose compiler doing this sort of thing! The resulting code would be suboptimal and fragile.


What percentage of code that a CPU will run over its lifetime is demoscene code? Heck, even of just simple hand-optimized assembly a CPU is likely to encounter, what percentage is not vector code? Because x86 vector code typically averages more than 4 bytes per instruction, and I have a suspicion that at least five nines of scalar instructions a CPU executes were generated by a compiler.


I mention that to point out the code density limits of x86 are much higher than what measurements using compiler output will show, while on the other hand I haven't seen the same for ARM and suspect that one can't really get much better than compiler output for it or other RISCs.

Having had to patch binaries on multiple occasions by inserting instructions, it is definitely not hard to do so for x86 as one can easily find "slack" that the compiler left behind[1], but I once had to do it for a MIPS binary, and it was definitely not easy to squeeze in the few extra instructions I needed inline; I ended up having to detour to another area with jumps instead.

Here's an old paper where the authors tried to optimise for code density manually, and you can consistently see x86 beating ARM and MIPS:

https://web.eece.maine.edu/~vweaver/papers/iccd09/iccd09_den...

[1] See https://news.ycombinator.com/item?id=15720923 for an example.


Yeah if code size is the only metric you care about. The second link is an excellent example of code you do not want a compiler to generate by default. Like, besides all the well-known performance pitfalls of microcoded instructions, jeczx is unfusable on I think all relevant CPUs, so it’s both an additional uop and an additional cycle of latency over a tst/jz sequence.


Five nines is really high. I don't think this is true, probably because language runtimes have hot paths that are typically implemented by hand. If we drop "scalar" then of course you're dropping below even two nines because of the implementation of str* and mem*.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: