Hacker News new | past | comments | ask | show | jobs | submit login

Shift has a lot less dependencies than ADD/LEA and has better reciprocal throughput.

Nope. Usually it's the reverse: Atom, for example, can only do one shift per cycle, but it can dual-issue adds.




This surprises a lot of programmers. A barrel shifter is a big circuit for an ALU. Smaller than a multiplier obviously, but much larger than a ripple carry adder and somewhat larger even than the fancy pipelined adders used in fast CPUs.

It's the kind of thing that got skimped on in Atom. Lots of stuff runs surprisingly slow on Atom. Another one I find striking is that there are two pipes for ADD/SUB instructions, but ADC/SBB (the carry/borrow variants) are not just single-issue, but apparently unpipelined. Intel lists their throughput at 3 instead of 0.5!


> Intel lists their throughput at 3 instead of 0.5!

Agner Fog lists ADC/SBB on Atom as having latency 2 and throughput 1/2. I guess Intel could have made them run with ADD/SUB-level performance in cases where there are no intra-pair dependencies on the carry/borrow flag, but the extra interlocks for handling that were probably not worth the cost.


My big mouth. I stand corrected.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: