https://gist.github.com/monocasa/1d44a03cbd0170bfffc6a4a5c37...
You can do it with 4 shifts, 3 adds, 1 MUL and 4 ANDs.
Your code is simply suboptimal.
https://gist.github.com/monocasa/1d44a03cbd0170bfffc6a4a5c37...