Hacker News new | past | comments | ask | show | jobs | submit login

Seriously. Permutes get harder as you scale - VBMI on CNL is an indicator that 64-way is pretty good but it's still considerably more expensive than 4 16-way permutes on the same architecture.

There's a reason that gather is hard to do; I think if you rocked up and asked the architecture guys for a gather that was competitive with small-scale permute they would reply with the time-honored Intel putdown ("You are overpaid for whatever it is you do").




> VBMI on CNL is an indicator that 64-way is pretty good

And now I'm distracted looking for 6-bit lookup tables that will enjoy that instruction. DES had 6-bit SBoxes for example.

https://en.wikipedia.org/wiki/DES_supplementary_material#Sub...

Hmmmmm... 6-bit lookup tables. Yum. I wonder what else is out there that would benefit?


Hey, you can have 7-bit lookup tables at the byte level on AVX512VBMI (using the 2-register shuffle forms) and you can already have 6-bit lookups with 2-register 16-bit shuffles if you can play around on Skylake Server.

Mass availability of the VBMI goodies looks to be bottlenecked behind Icelake/Sunny Cove, so you'll have plenty of time to think through the implications of fast 6-bit lookup. :-)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: