I recently read a book by Google’s head guy on API design that was specifically about designing APIs and it had a big section on what makes a good identifier and why people reach for UUIDs and why specifically it is a problem on multiple levels.
The thing that he ended up recommending however was super interesting in that I had never seen it mentioned before but it was basically to use this instead http://www.crockford.com/base32.html
I use crockford 32 to _represent_ my UUIDs, but they are obviously stored as binary. Is the only problem with UUIDs that sometimes they get stored as strings?
The only issue I've had with UUIDs is when they don't sort in increasing chronological order. RDBMSs don't appreciate high insert loads into random points in the index. Take care of that, however, and they're a treat.
This format was discussed in a HN first page post just this week:
> This alphabet, 0123456789ABCDEFGHJKMNPQRSTVWXYZ, is Douglas Crockford's Base32, chosen for human readability and being able to call it out over a phone if required.
F and S sound similar over the phone, at least on POTS landlines, as they don't carry the higher frequencies (> 4 kHz) that distinguish the S from the F. Note that cat names tend to have S sounds.
POTS = Plain old telephony service is restricted to a narrow frequency range of 300–3,300 Hz, called the voiceband, which is much less than the human hearing range of 20–20,000 Hz [from https://en.wikipedia.org/wiki/Plain_old_telephone_service ]
but if mistake is made, and you wrote down L instead of 1, and sent me in a e-mail. I, knowing that it is crockford 32, would easily deduce what mistake was made.
Yeah when he lays out the arguments for it in the book you can clearly see why it makes a huge amount of sense. The usability, the performance, the value of a checksum etc…
I'm a big fan of Crockford-flavored Base32, as in $dayjob the folks who did some of the fundamental engineering work for the product I work with decided 10+ years ago to use Crockfor-flavored Base32 as the way to expose most user-visible IDs. Things I love about it are all mentioned in Crockford's spec, but I'll restate them here:
- It's case-insensitive allowing data encoded with it to survive going through most random line-of-business applications which may have errant UPPER() or lower() calls somewhere in their depths, as well as making it easy for humans to talk about out-loud (no need to specify case makes it easy)
- Making all the "pillar" shaped characters (I, i, 1, l) equivalent and all the "donut" shaped characters (o, O, 0) equivalent means big swaths of human typing mistakes are avoided. After so much exposure to Crockford Base32, I now loath having to decipher "is it a capitol 'I'? Or a lowercase 'l'?"
Overall, it's a great way to encode any user-facing ID; just make sure you understand that just using this encoding won't stop users from realizing that Crockford Base32 encoded IDs are sequential. If you want sequential IDs to appear non-obviously sequential after encoding you'll need to use additional techniques. Assuming Crockford Base32 obscures sequential IDs is the one case where I saw someone do something they shouldn't as it directly relates to Crockford Base32.
I loved it, think it all made a ton of sense. Lots of code samples, no weird technology choices (normally they would do it all via protobufs and gRPC but they keep the same principles and just use HTTP instead and Typescript for code samples)
We use crockford tokens heavily where I work, highly recommend them. We typically use a prefix to denote the type of thing being identified, like “U-“ for User followed by a 10-15ish long crockford token. Works great.
The thing that he ended up recommending however was super interesting in that I had never seen it mentioned before but it was basically to use this instead http://www.crockford.com/base32.html