Hacker News new | past | comments | ask | show | jobs | submit login

> After checking for loop termination, we fall through into the foo function (without a branch!)

Literally just reading this line made me go "well, there's your problem." I thought this was going to be something deep about fancy branch predictor heuristics, but it turns out to be a violation of basic heuristics.

Folks, don't think you're going to get amazing speedups by using mismatched call/ret instructions. The branch predictor keeping a shadow stack of return addresses is something that's been around for decades.






It's great that you're so knowledgeable about branch predictor behavior, but many people aren't and for them this is new, maybe even valuable, information.

I guess this article's just not for you then, and that's okay.


I don't think the complaint is that the article exists, it's that it perhaps needs more context. The article does not present the level of skill assumed about the reader or of the author, so comments like these help put that in perspective.

The commentary from "old salts" isn't meant to insult or discourage you but to inform your world view with hard earned experience.

It all has a value that's greater than the sum of it's parts and there's no good reason to be critical or reactionary about it.


If you're actually an old salt, it will be immediately obvious when an article is not meant for your level of expertise. Whining that "the article needs more context" is like Michelangelo complaining that a coloring book didn't warn him it was for six year old children.

And this will screw up program execution more profoundly (i.e.crash) on systems with architectural shadow call stacks as a security feature.

Speaking of "security feature", this insanely complex behavior (for a CPU) is a recipe for disasters like Specter and Meltdown.

On the one hand, a design goal of RISC is to improve the performance of compiled code at the expense of most other things. As such, this sort of hazard should be documented, but the designers should be able to assume that anyone directly writing assembler has read the documentation.

On the other hand, Sophie Wilson wrote an implementation of BBC BASIC for the original ARM (but it didn't have a branch predictor). While that is 32-bit and plays by different rules, I wonder how AArch64 slows down code when the architectural assumptions change.


And yet they also showed how they did actually accomplish this and other optimizations. It’s an informative read.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: