clang-802.0.42 -O3 -march=native on a Core i7-4770HQ @ 2.20GHz:
branchless: 490.666725 MB/s, 0 errors 0xe9 small: 380.000045 MB/s, 0 errors Hoehrmann: 337.333374 MB/s, 0 errors
branchless: 436.000052 MB/s, 0 errors 0xe9 small: 405.333382 MB/s, 0 errors Hoehrmann: 412.000049 MB/s, 0 errors
clang-802.0.42 -O3 -march=native on a Core i7-4770HQ @ 2.20GHz:
With gcc 4.8.4 -O3 -march=native on a Xeon E5-1650v3 @ 3.50GHz: https://pastebin.com/j7T3bDwR