OpenGL is a great example of how painful it is to write apps that are capable of...

ObviousScience · on Oct 9, 2014

One thought on version number is that the version number could be the product of prime powers.

If function 1 is on version 5, function 2 on version 3, function 3 on version 7, and function 4 on version 1, we can encode this as 2^5 * 3^3 * 5^7 * 7^1 = 472500000. This lets us unambiguously increment orthogonal values while still keeping them represented as a single number. We could even easily add a function 5 by multiplying the number by 11.

One problem is that it's not super dense (storing those 4 values takes ~29 bits), but a 128bit or 256bit number would store a larger value than is likely needed by most applications.

One benefit is that software can check for the functions that it's interested in (as long as we can agree on what prime corresponds to what function - a relatively mild standard, if we just use order of introduction) while ignoring any portion of the result it doesn't understand.

Danieru · on Oct 9, 2014

> Storing those 4 values takes ~29 bits.

Just storing those 4 values in 4 bits is more efficient.

Checking that a bit is set is also a simple AND + CMP. Which beats out a DIV + CMP.

Sorry I just had not considered this isn't common knowledge outside C or C++ land.

Thrymr · on Oct 9, 2014

> Just storing those 4 values in 4 bits is more efficient.

That doesn't really work for version numbers > 1. They're not just flags.

njr123 · on Oct 10, 2014

How do you store a version number in a single bit?

rjaco31 · on Oct 10, 2014

Well the bit n represents the version n, if this bit is at 1, this version is supported, otherwise it's not.

ajuc · on Oct 10, 2014

A few bits per value (depending how big you predict they will get). One bit per value means you can only increment version of each function once :)

eru · on Oct 10, 2014

That's a crazy scheme on a computer.

Just store, say, a byte per value, and concat them into your 128 bit number.

jhallenworld · on Oct 10, 2014

It's easy to come up with much more compact schemes that are still completely generic. For example, to represent a variable length number use xxx0, 0xxxxxx1, 0xxxxxx1xxxxxx1, etc. Normally you need a pair of these to specify (version,function). But if we assume functions are packed, you can have a run-length scheme (run-length,starting-function-number).

So "function 1 is on version 5, function 2 on version 3, function 3 on version 7, and function 4 on version 1" is four functions starting with 1 = (1000,0010),1010,0110,1110,0010 = 24 bits.

It gets better quick with larger numbers of functions.

BTW, the size of the prime scheme is log2(nthprime(function))*version bits for each function. If you don't know ahead of time which functions there might be, then you have to do a prime factorization, which is a hard problem. I guess if you used really large numbers you could have a cryptographically secure way of indicating which versions of which functions are in your library.

abstrakraft · on Oct 10, 2014

This is called Godel numbering (http://en.wikipedia.org/wiki/Godel_numbering).

jessaustin · on Oct 9, 2014

One problem is that it's not super dense...

If you start with version 0 (and why not?), I think the Fundamental Theorem of Arithmetic implies maximum possible density?

simonster · on Oct 10, 2014

You have maximum density for version 0, but when versions get sufficiently high then the numbers of functions there are sequences of bits with no valid interpretation because they represent primes or multiples of primes > the number of functions. Also increasing the versions of functions represented by larger prime numbers requires more bits than increasing the versions of functions represented by smaller numbers.

I think I would just represent the version of each function with a byte.

jessaustin · on Oct 10, 2014

D'oh! Yes, any possible library contains only a finite number of functions to version. Even if we order the functions in decreasing order of update frequency, this scheme will overflow quickly if more than one or two functions update regularly.

Retric · on Oct 9, 2014

NM, That's space efficient for 0's, but only 0 values. Your probably better off with something easy to decode. aka 2 bytes per function for an ID = 50,000 functions for 100kb which is hardly an issue. And if you really need to have more than 65,000 versions of the same function you can always deprecate it. An if your really only tracking 10 functions then that's still only 10 bytes which is hardly an issue.

55th prime > 256. So incrementing the version number adds more than one full byte to your number.

kenneth · on Oct 10, 2014

I think the suggestion was to multiply the powers of primes. e.g.

fn 2 @ v1, fn 3 @ v1 = 2^1 * 3^1 = 6

fn 2 @ v2, fn 3 @ v1, fn 5 @ v3 = 2^2 * 3^1 * 5^3 = 2 * 2 * 3 * 5 * 5 * 5 = 1500

Retric · on Oct 10, 2014

I guess that might be useful if the vast majority of things are on version 0. But it quickly get's out of hand.

EX: Multiplying by 2 adds a full bit to your number every time you increment it. And it just get's worse from there.

If you really want to store mostly zeros your probably better off just using a count of the functions > 0 and then either use a bit for each function to show it's > 0 or an index if things are sparse.

ObviousScience · on Oct 10, 2014

It's certainly true that it has a nasty growth curve (exponential in version number; probably worse in number of active terms).

I just think it's fun for low value things, and interesting if you're in the situation you have a huge feature set relative to the number of active features, since it incurs no penalty for fields that are 0. Any time you have about ~1% features being present, it can make a difference. Example: if you have about 5 of 500 fields with 16 possible values represented, storing all the 0s is expensive. Using a naive 4-bits-per-field, you end up with 2000 bits, whereas, using my method, you only use ~620. Even using a 500bit number to flag which fields are present and then listing just those fields in order only saves you about ~100 bits over my method.

Plus, I manage to spark off a discussion about math and the problem, rather than just whining about it being hard, which I consider to be a good thing.

Retric · on Oct 10, 2014

What's wrong with indexes? 1024 = 10 bits * 5 = 50bits then 4 bits * 5 = 70 bits total.

Or just a bit flag for active (500bits) then 4 bits * 5 for your count = 520 bits.

And if you really want to compress things start with a count so the 500 bit flag is only used if your count is over 50.

PS: in your case you also need an index of number if bits / bytes used.

Edit: If there is a reasonable chance your counts are greater than 1 just appending one bit per count vs using higher exponents saves space aka for 3 numbers append 100101 = first 1 then 3 then 2.