Hacker News new | past | comments | ask | show | jobs | submit login

Hey Piotr/pchm, I'm not sure I follow your argument that Base32 is less popular because it's not a standard (there is a standard - RFC4648 as you mention).

Not implementing the RFC, is not implementing Base32, changing the order, or using 32 emoji does not make it Base32. Put another way, you can change the order of characters in Base64, or use a different dictionary, and indeed there are several variants of that too (BinHex4, Uuencoding, Base64Url, B64) - there are specific implementation detail concerns there too.

Base64 won out as a reasonably dense way to encode binary data in 7-bit safe ASCII for use in email, and later http headers (where spacing and line length may be modified in transit, and some ASCII characters are prohibited - eg 0x00/null). Part of the reason is; bit-grouping makes encode/decode simpler (you can use bit shifting). Something like ASCII85/Base85 which is a more dense encoding, and close to the maximum you can get in 7 bit safe ASCII (94 characters 33-126 if space is important, 95 if space quantity can be preserved) but you have to use multiply/divide instructions. The union of bit-shift speed (power of 2) and 7-bit safe ASCII characters (max 94 values) is: binary, base4, octal, hexadecimal, base32, and base64.

For human readability, especially verbal communication, hexadecimal or base32 are advantageous in that they are more dense than decimal, can be generated via bit-shifting vs more complex processor instructions, but you needn't also communicate the character's case (unlike Base64).




You make some good points. What I was trying to say is that even though there is the RFC, it's quite common to modify the alphabet or use other variants like Crockford's (mainly to avoid random profanity, e.g. in the URL identifiers).

When you see a Base64 string, you can be pretty certain that it's the standard version. With Base32, it's not obvious which variant was used.

Many languages don't provide a stdlib Base32 implementation (Ruby doesn't), but Base64 is pretty much always included. Maybe this influenced my perception of the lack of a universal standard.

Anyway, I should work on that section to communicate my point better.


I believe the technical term is “Schelling point”: something that people can decide on without communication.

Base64 is very close to the Schelling point of Base62 i.e. [A-Za-z0-9], requiring only a couple more additional decisions to be made: which two extra characters to add.

Unfortunately the original Base64 inexplicably got this wrong and chose + and / instead of the more sensible choice of - and _


In some cases (luck of the data, but often when encoding ASCII without padding) you won't see the non alphanumeric characters (62nd and 63rd place) in Base64 either. So you can't always tell the difference between Base64, Base64Url, Xxencode, or B64.

"Hello, world!" = `SGVsbG8sIHdvcmxkIQ` (base64, base64url), `BG4JgP4wg65RjQalY6E` (Xxencode), or `G4JgP4wg65RjQalY6E` (b64). A legitimate reason for choosing B64 over Base64 would be: it maintains ASCII sort-order.

Any language that has to deal with HTTP (or MIME) has to encode/decode Base64 in order to support some headers (eg Basic auth) and features (binary data from a form submission). There is no similar HTTP need for Base32, so perhaps it's less surprising it's not in the standard library?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: