Hacker News new | past | comments | ask | show | jobs | submit login

I recently read a book by Google’s head guy on API design that was specifically about designing APIs and it had a big section on what makes a good identifier and why people reach for UUIDs and why specifically it is a problem on multiple levels.

The thing that he ended up recommending however was super interesting in that I had never seen it mentioned before but it was basically to use this instead http://www.crockford.com/base32.html




I see this around the crypto world a lot lately (notably for Substrate/Polkadot addresses): base58 https://tools.ietf.org/id/draft-msporny-base58-01.html

Seems to have the same idea of "human-read-write without visually identical characters" but with an expanded set for shorter string length.


I use crockford 32 to _represent_ my UUIDs, but they are obviously stored as binary. Is the only problem with UUIDs that sometimes they get stored as strings?

The only issue I've had with UUIDs is when they don't sort in increasing chronological order. RDBMSs don't appreciate high insert loads into random points in the index. Take care of that, however, and they're a treat.


ULIDs or time prefixed UUIDs are other options.


The linked article discusses the issues - the storage size when encoded as char, as 16byte values, as well as the impact on read/write locality.


Where does it discuss any of this? It compares base 10/16/32/64. There’s no mention at all of UUIDs or GUIDs.

(Or do you mean the original article, rather than the linked Crockford article?)


I mean the original.


Yeah I seem to recall he also mentioned that as an approach.


This format was discussed in a HN first page post just this week:

> This alphabet, 0123456789ABCDEFGHJKMNPQRSTVWXYZ, is Douglas Crockford's Base32, chosen for human readability and being able to call it out over a phone if required.

https://news.ycombinator.com/item?id=29794186


People in my industry in my country has specifically avoid using B and D together as they sound too similar over the phone.

Also 2 and Z can be similar in writing.

However it is nice to not see 0 and O, 1,I,l in the same string.


F and S sound similar over the phone, at least on POTS landlines, as they don't carry the higher frequencies (> 4 kHz) that distinguish the S from the F. Note that cat names tend to have S sounds.

POTS = Plain old telephony service is restricted to a narrow frequency range of 300–3,300 Hz, called the voiceband, which is much less than the human hearing range of 20–20,000 Hz [from https://en.wikipedia.org/wiki/Plain_old_telephone_service ]


Anybody who has to relay things like API or CD keys over a POTS line on a regular basis quickly learns the NATO phonetic alphabet.


If you're worried about clarity over the phone, you should look into the NATO phonetic alphabet: https://en.wikipedia.org/wiki/NATO_phonetic_alphabet


I prefer to use Aeon, Bdellium, Czar, Djinn, Eye, etc.


The bomb defusal scene in Archer was an absolute classic for this. https://youtu.be/_4jxLxZrMfs


> Djinn

Fun fact: dzs counts as a single letter in Hungarian (e.g. in alphabetical ordering).


Quite a challenge for non-English crowd.


You still have to know that 0 is 0 and not O, and that 1 is 1 and not I or l.


but if mistake is made, and you wrote down L instead of 1, and sent me in a e-mail. I, knowing that it is crockford 32, would easily deduce what mistake was made.


Right, I didn't realize the decoder is specified to be lenient in that way, so the confounded characters are actually equivalent in the encoding.


Yeah when he lays out the arguments for it in the book you can clearly see why it makes a huge amount of sense. The usability, the performance, the value of a checksum etc…


Looking for that quote I can't find it on that page.


I'm a big fan of Crockford-flavored Base32, as in $dayjob the folks who did some of the fundamental engineering work for the product I work with decided 10+ years ago to use Crockfor-flavored Base32 as the way to expose most user-visible IDs. Things I love about it are all mentioned in Crockford's spec, but I'll restate them here:

- It's case-insensitive allowing data encoded with it to survive going through most random line-of-business applications which may have errant UPPER() or lower() calls somewhere in their depths, as well as making it easy for humans to talk about out-loud (no need to specify case makes it easy)

- Making all the "pillar" shaped characters (I, i, 1, l) equivalent and all the "donut" shaped characters (o, O, 0) equivalent means big swaths of human typing mistakes are avoided. After so much exposure to Crockford Base32, I now loath having to decipher "is it a capitol 'I'? Or a lowercase 'l'?"

Overall, it's a great way to encode any user-facing ID; just make sure you understand that just using this encoding won't stop users from realizing that Crockford Base32 encoded IDs are sequential. If you want sequential IDs to appear non-obviously sequential after encoding you'll need to use additional techniques. Assuming Crockford Base32 obscures sequential IDs is the one case where I saw someone do something they shouldn't as it directly relates to Crockford Base32.


> (...) but it was basically to use this instead

Base32 is a representation format which provides a textual representation of numbers, not a storage format.

I mean, anyone is free to dump Base32, or even base 2, into a string and just run with that, but that would be highly inefficient.


Oh wow thanks! I was familiar with Zimmerman's base 32, (z-base-32, https://philzimmermann.com/docs/human-oriented-base-32-encod...) but not Crockford's.

I just added it to the "Other projects" section of my base converter. https://convert.zamicol.com/


ULIDs are base 32 as well. https://github.com/oklog/ulid


Is the spec not a better link than a go implementation?

https://github.com/ulid/spec


What was the book?


https://www.bookdepository.com/API-Design-Patterns-JJ-Geewax...

I loved it, think it all made a ton of sense. Lots of code samples, no weird technology choices (normally they would do it all via protobufs and gRPC but they keep the same principles and just use HTTP instead and Typescript for code samples)


Glad to hear ! :-)


https://livebook.manning.com/book/api-design-patterns/chapte...

Section 6.3.3 gets to the point about base32


We use crockford tokens heavily where I work, highly recommend them. We typically use a prefix to denote the type of thing being identified, like “U-“ for User followed by a 10-15ish long crockford token. Works great.


Would you happen to have a link to the book?




Which book?





Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: