Mmm, my impression is you can get 90% there with significantly less effort then LLVM or GCC. A good register allocation strategy, some basic inlining, maybe peephole optimization? LLVM is fighting for 5-10% improvements on top of the big things
Or an ecosystem-wide shift away from AOT compilation.
I feel like I'm living in that "one guy against huge crowd" meme. Everyone else is making more AOT, and all I can say, usually under my breath, is "Wait! Stop! You're giving up something valuable! JIT and GC are actually good!"
(I know you can have AOT and GC, but people usually lump AOT and !GC together.)
AOT optimizes startup performance. I haven't seen it optimize other metrics. Psychologically, we often gauge overall performance by startup performance, but it ain't necessarily so. I keep expecting the pendulum to swing from "AOT and manual memory management" back to "JIT and garbage collection", because I believe the latter has a much higher performance ceiling once we apply sufficient optimization elbow grease. I've been patient and I'll remain patient.
But still. PGO is just a hack around the lack of a JIT, just like RCU is a hack around the lack of a GC, and I think it's due to well-known psychological biases that we haven't embraced more dynamism.
How do we solve the startup problems with JIT systems? Better image support. Nobody has really cracked this nut, but I think it's important. Every GraalVM language should support, e.g., resuming from a heap dump.
Any time I'm doing lego-brick engineering, aka unix style "small tools pipe together", then startup time impacts me most while I'm actively developing the new system.
That process is iterative, usually on a subset of the data, and slow starting tools make my day qualitatively worse.
The final performance matters sometimes, but more often than not it doesn't. 30 minutes or 45 for a batch job? That runs from cron? I don't usually care.
I'm never going to favour slow starting tools unless some aspect of their functionality pays for that cost, or I'm choice constrained, but final peak performance is almost never going to be a factor.
So to me it's not psychological bias, it's economics, and the currency is time, when it's me personally paying it. Semantics maybe, but just writing it of as bias isn't going to help me get things done.
Dynamic systems like the ones based on javascript or Java or scheme have received A LOT of attention in the last 10-20 years. In fact, javascript does start fairy quickly.
The usual solution to the startup problem is to have many compilers working at once (aka tiered jit compilation). A fast bytecode interpreter + mid-level jit + ssa-based heavy stuff, all working in parallel. Obviously, this means a lot of background compute and memory burned.
I'm actually glad to see that restriction in their study, because there are many cases where you just can't / more difficult to use PGO. One of the most common reasons I ran into is that clients refuse to use it (no, I'm not even joking), either they don't know how to come up with a good training data (in which case they'll point at your nose shouting IT DOESN'T WORK!!) or they thought it's a stupid idea (again, not joking). You'll be amazed by some people's stubbornness
Nice to see it being adopted more broadly.