Here is it without those operations: https://i.imgur.com/xhyNnU4.png?1 EDIT: Wit...

dalke · on May 9, 2015

The original posting ended with a second-order polynomial, a few bitwise operations, and an if statement.

All of the ones you just posted are more computationally difficult. The non-trig ones, for example, include a division, and the best fit is a higher-order polynomial. There's no reason to believe it will be as fast to evaluate, so the question would be, is it more accurate and is the accuracy worth the additional time? It would be ill-advised to use one of those if it ended up being both less accurate than log() and slower.

Concerning lookup tables, as tgbrter answered in this thread, they are likely slower. It still needs the scaling, and also needs two value lookups followed by an interpolation. The cost of doing the interpolation will be about the same as evaluating the second order polynomial, but the pipeline will be waiting on the two memory lookups.

Assuming everything in is L2 cache, that might not be a problem. But depending on the problem, it may be in L3 or (shudder) main memory, which is eons away. So as tgbrter also pointed out, you may end up with variable calculation time depending on the cache.

Houshalter · on May 10, 2015

Eureqa lets you choose what functions you use and what cost each has. I need to know what the expense is for each operation and I can just feed it in. I removed division and increased the cost for multiplication: https://i.imgur.com/8C9OBEY.png?1 (I'm using the author's error metric of relative error, which is what the 0 = abs(y-...)/y means.)

Here it is without mod which I think might be expensive: pg1: https://i.imgur.com/SMvj68W.jpg, and pg2: https://i.imgur.com/ACw321W.png?1

And of course the best fit has extra complexity, that's why it gives you many choices along the complexity tradeoff.

I wasn't suggesting a lookup table to approximate the function, but the choose the formula itself. E.g. you might have a function which fits the first half of the function very well, and another which fits the second half very well. I think that branching code is slower, so alternatively you can do a lookup for the constants of the formula. Some constants will be optimized for one part of the function, and another set of constants will be optimized for another part.

I don't think that kind of lookup would be too slow. You only need to fit a few elements in the cache, and interpolation is a simple operation. It's just a subtraction, a multiplication, and an addition.

Houshalter · on May 10, 2015

EDIT: In the last paragraph I mean I don't think a complete lookup table and interpolation, like you mentioned, would be too slow. Not referring to the lookup table of formulas I suggested, but computed log values.

dalke · on May 10, 2015

I am confused about why you need to think that certain operations are/are not expensive, or why you think a table is faster.

I think they are expensive, and it won't be faster. Hence we're at an impasse. That block was predictable, and easily solve by taking the equation you have and testing against the reference code.

If you test it out, you'll know the answer. There's no need to think you know the answer when you can just measure it, and overrule any incorrect beliefs I may have.

Houshalter · on May 12, 2015

As I said, I don't have a table of how fast each operation is. I don't think any of my assumptions were unreasonable, but I did explicitly state they were assumptions. And I don't have any way of testing it.

I am interested in function approximation mainly for finding functions for when there is no closed form solution, not so much tiny optimization.

tgbrter · on May 9, 2015

You need only one table look-up, a couple quick operations and no conditionals or multiplying.

The table size required to beat the accuracy of fastlog2 is 512 elements.

edit: The speed ratio is actually about 0.6 in favor of look-up versus fastlog2. Enabling -O3 changes the ratio to 0,9. This is of course not measured in a real-world program where the cache is shared by other stuff.

edit2: I have removed the if statement from fastlog2 and compiled with -O3. The ratio was 1,2.

dalke · on May 10, 2015

Indeed, you're right. I forgot about step-function solutions, and thought interpolation/corrections were needed.

512 elements for 7 bits of accuracy sounds reasonable. The article mentioned a 2x improvement for 11 bits, but didn't get to that point.