Hacker News new | past | comments | ask | show | jobs | submit login

I'm pretty sure the patten of allowing the branch predictor to run ahead is pretty common.

At least, it's common to have multi-level branch predictors that take a variable number of cycles to return a result, and it makes a lot of sense to queue up predictions so they are ready when the decoder gets to that point.

But I doubt the idea of parallel decoders makes any sense out side of x86's complex variable length instructions.

It (probably) makes sense on x86 because x86 cores were already spending a bunch of power on instruction decoding and the uop cache.

> Plus i would assume it makes the cost of miss-predict even higher.

It shouldn't increase the miss-predict cost by too much.

The new fetch address will bypass the branch-prediction queue and feed directly into one of the three decoders. And previous implementations already have a uop queue between the decoder and re-name/dispatch. It gets flushed and the first three uops should be able to cross it in a single cycle.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: