Where the pointers are notionally 32+k bits long, but the bottom k bits are always zero, so they can be stored in 32 bits. This would mean that records always have to be aligned to a 2^k-byte boundary. If the data has some natural alignment anyway, this could be arranged so as to not involve wasted space, and even if it doesn't, for large keys and/or values, the amount of wasted space would be relatively small.
Alternatively, there was a way to reliably locate the start of a record when scanning through data, the records could be packed without padding, and the 2^k-aligned pointers could simply be approximate, pointing to somewhere in the 2^k bytes before the start of the record. Retrieval would involve following the pointer, then scanning forward to find the actual record. This would be a bit sketchy, but something a bit like this is done in ATM:
CDB doesn't include CRCs. However, you could think up various schemes to identify records based on what you know about the record format, the key you're looking for, and its hash. It's probably not worth the effort!
https://wikis.oracle.com/display/HotSpotInternals/Compressed...
Where the pointers are notionally 32+k bits long, but the bottom k bits are always zero, so they can be stored in 32 bits. This would mean that records always have to be aligned to a 2^k-byte boundary. If the data has some natural alignment anyway, this could be arranged so as to not involve wasted space, and even if it doesn't, for large keys and/or values, the amount of wasted space would be relatively small.
Alternatively, there was a way to reliably locate the start of a record when scanning through data, the records could be packed without padding, and the 2^k-aligned pointers could simply be approximate, pointing to somewhere in the 2^k bytes before the start of the record. Retrieval would involve following the pointer, then scanning forward to find the actual record. This would be a bit sketchy, but something a bit like this is done in ATM:
http://en.wikipedia.org/wiki/CRC-based_framing
CDB doesn't include CRCs. However, you could think up various schemes to identify records based on what you know about the record format, the key you're looking for, and its hash. It's probably not worth the effort!