OpenGL is a great example of how painful it is to write apps that are capable of not breaking when the standard it relies on changes. You basically end up with capability checks or version checks which cause your code to become complicated.
I suppose there could've been a function that returned the implementation version of fsin/fcos/etc, which was incremented whenever Intel changed the implementation. That way benchmarks could test against that particular version. But it'd be hard to come up with a precise, consistent way to implement that. For example, do you increment the version whenever any transcendental function changes? If you do that, then you get benchmark code riddled with special cases for very specific versions, which someone has to constantly look up in documentation somewhere. You'd basically have to memorize the effect of every version increment. On the other hand, you could implement a more elaborate scheme, like having a version number for each of the transcendental functions, but then you'd either need to hardcode the number of version numbers returned by the API, or expose a way to add to the total number of version numbers in case someone wanted to add another function, which is extra API complexity.
I'm not necessarily arguing against having some kind of process by which apps could have the implementation of sin/cos/etc changed, just explaining how it gets complicated pretty quickly.
One thought on version number is that the version number could be the product of prime powers.
If function 1 is on version 5, function 2 on version 3, function 3 on version 7, and function 4 on version 1, we can encode this as 2^5 * 3^3 * 5^7 * 7^1 = 472500000. This lets us unambiguously increment orthogonal values while still keeping them represented as a single number. We could even easily add a function 5 by multiplying the number by 11.
One problem is that it's not super dense (storing those 4 values takes ~29 bits), but a 128bit or 256bit number would store a larger value than is likely needed by most applications.
One benefit is that software can check for the functions that it's interested in (as long as we can agree on what prime corresponds to what function - a relatively mild standard, if we just use order of introduction) while ignoring any portion of the result it doesn't understand.
It's easy to come up with much more compact schemes that are still completely generic. For example, to represent a variable length number use xxx0, 0xxxxxx1, 0xxxxxx1xxxxxx1, etc. Normally you need a pair of these to specify (version,function). But if we assume functions are packed, you can have a run-length scheme (run-length,starting-function-number).
So "function 1 is on version 5, function 2 on version 3, function 3 on version 7, and function 4 on version 1" is four functions starting with 1 = (1000,0010),1010,0110,1110,0010 = 24 bits.
It gets better quick with larger numbers of functions.
BTW, the size of the prime scheme is log2(nthprime(function))*version bits for each function. If you don't know ahead of time which functions there might be, then you have to do a prime factorization, which is a hard problem. I guess if you used really large numbers you could have a cryptographically secure way of indicating which versions of which functions are in your library.
You have maximum density for version 0, but when versions get sufficiently high then the numbers of functions there are sequences of bits with no valid interpretation because they represent primes or multiples of primes > the number of functions. Also increasing the versions of functions represented by larger prime numbers requires more bits than increasing the versions of functions represented by smaller numbers.
I think I would just represent the version of each function with a byte.
D'oh! Yes, any possible library contains only a finite number of functions to version. Even if we order the functions in decreasing order of update frequency, this scheme will overflow quickly if more than one or two functions update regularly.
NM, That's space efficient for 0's, but only 0 values. Your probably better off with something easy to decode. aka 2 bytes per function for an ID = 50,000 functions for 100kb which is hardly an issue. And if you really need to have more than 65,000 versions of the same function you can always deprecate it. An if your really only tracking 10 functions then that's still only 10 bytes which is hardly an issue.
55th prime > 256. So incrementing the version number adds more than one full byte to your number.
I guess that might be useful if the vast majority of things are on version 0. But it quickly get's out of hand.
EX: Multiplying by 2 adds a full bit to your number every time you increment it. And it just get's worse from there.
If you really want to store mostly zeros your probably better off just using a count of the functions > 0 and then either use a bit for each function to show it's > 0 or an index if things are sparse.
It's certainly true that it has a nasty growth curve (exponential in version number; probably worse in number of active terms).
I just think it's fun for low value things, and interesting if you're in the situation you have a huge feature set relative to the number of active features, since it incurs no penalty for fields that are 0. Any time you have about ~1% features being present, it can make a difference. Example: if you have about 5 of 500 fields with 16 possible values represented, storing all the 0s is expensive. Using a naive 4-bits-per-field, you end up with 2000 bits, whereas, using my method, you only use ~620. Even using a 500bit number to flag which fields are present and then listing just those fields in order only saves you about ~100 bits over my method.
Plus, I manage to spark off a discussion about math and the problem, rather than just whining about it being hard, which I consider to be a good thing.
Or just a bit flag for active (500bits) then 4 bits * 5 for your count = 520 bits.
And if you really want to compress things start with a count so the 500 bit flag is only used if your count is over 50.
PS: in your case you also need an index of number if bits / bytes used.
Edit: If there is a reasonable chance your counts are greater than 1 just appending one bit per count vs using higher exponents saves space aka for 3 numbers append 100101 = first 1 then 3 then 2.
I suppose there could've been a function that returned the implementation version of fsin/fcos/etc, which was incremented whenever Intel changed the implementation. That way benchmarks could test against that particular version. But it'd be hard to come up with a precise, consistent way to implement that. For example, do you increment the version whenever any transcendental function changes? If you do that, then you get benchmark code riddled with special cases for very specific versions, which someone has to constantly look up in documentation somewhere. You'd basically have to memorize the effect of every version increment. On the other hand, you could implement a more elaborate scheme, like having a version number for each of the transcendental functions, but then you'd either need to hardcode the number of version numbers returned by the API, or expose a way to add to the total number of version numbers in case someone wanted to add another function, which is extra API complexity.
I'm not necessarily arguing against having some kind of process by which apps could have the implementation of sin/cos/etc changed, just explaining how it gets complicated pretty quickly.