I recently read a book by Google’s head guy on API design that was specifically ...

Nav_Panel · on Jan 8, 2022

I see this around the crypto world a lot lately (notably for Substrate/Polkadot addresses): base58 https://tools.ietf.org/id/draft-msporny-base58-01.html

Seems to have the same idea of "human-read-write without visually identical characters" but with an expanded set for shorter string length.

pkulak · on Jan 8, 2022

I use crockford 32 to _represent_ my UUIDs, but they are obviously stored as binary. Is the only problem with UUIDs that sometimes they get stored as strings?

The only issue I've had with UUIDs is when they don't sort in increasing chronological order. RDBMSs don't appreciate high insert loads into random points in the index. Take care of that, however, and they're a treat.

paulryanrogers · on Jan 8, 2022

ULIDs or time prefixed UUIDs are other options.

staticassertion · on Jan 8, 2022

The linked article discusses the issues - the storage size when encoded as char, as 16byte values, as well as the impact on read/write locality.

dpark · on Jan 8, 2022

Where does it discuss any of this? It compares base 10/16/32/64. There’s no mention at all of UUIDs or GUIDs.

(Or do you mean the original article, rather than the linked Crockford article?)

staticassertion · on Jan 8, 2022

I mean the original.

mhoad · on Jan 8, 2022

Yeah I seem to recall he also mentioned that as an approach.

carfacts · on Jan 8, 2022

This format was discussed in a HN first page post just this week:

> This alphabet, 0123456789ABCDEFGHJKMNPQRSTVWXYZ, is Douglas Crockford's Base32, chosen for human readability and being able to call it out over a phone if required.

https://news.ycombinator.com/item?id=29794186

davidjytang · on Jan 8, 2022

People in my industry in my country has specifically avoid using B and D together as they sound too similar over the phone.

Also 2 and Z can be similar in writing.

However it is nice to not see 0 and O, 1,I,l in the same string.

FabHK · on Jan 8, 2022

F and S sound similar over the phone, at least on POTS landlines, as they don't carry the higher frequencies (> 4 kHz) that distinguish the S from the F. Note that cat names tend to have S sounds.

POTS = Plain old telephony service is restricted to a narrow frequency range of 300–3,300 Hz, called the voiceband, which is much less than the human hearing range of 20–20,000 Hz [from https://en.wikipedia.org/wiki/Plain_old_telephone_service ]

jandrese · on Jan 8, 2022

Anybody who has to relay things like API or CD keys over a POTS line on a regular basis quickly learns the NATO phonetic alphabet.

et-al · on Jan 8, 2022

If you're worried about clarity over the phone, you should look into the NATO phonetic alphabet: https://en.wikipedia.org/wiki/NATO_phonetic_alphabet

cperciva · on Jan 8, 2022

I prefer to use Aeon, Bdellium, Czar, Djinn, Eye, etc.

slavik81 · on Jan 8, 2022

The bomb defusal scene in Archer was an absolute classic for this. https://youtu.be/_4jxLxZrMfs

oldsecondhand · on Jan 8, 2022

> Djinn

Fun fact: dzs counts as a single letter in Hungarian (e.g. in alphabetical ordering).

davidjytang · on Jan 9, 2022

Quite a challenge for non-English crowd.

layer8 · on Jan 8, 2022

You still have to know that 0 is 0 and not O, and that 1 is 1 and not I or l.

tapas73 · on Jan 8, 2022

but if mistake is made, and you wrote down L instead of 1, and sent me in a e-mail. I, knowing that it is crockford 32, would easily deduce what mistake was made.

layer8 · on Jan 8, 2022

Right, I didn't realize the decoder is specified to be lenient in that way, so the confounded characters are actually equivalent in the encoding.

mhoad · on Jan 8, 2022

Yeah when he lays out the arguments for it in the book you can clearly see why it makes a huge amount of sense. The usability, the performance, the value of a checksum etc…

Zamicol · on Jan 8, 2022

Looking for that quote I can't find it on that page.

lelandbatey · on Jan 8, 2022

I'm a big fan of Crockford-flavored Base32, as in $dayjob the folks who did some of the fundamental engineering work for the product I work with decided 10+ years ago to use Crockfor-flavored Base32 as the way to expose most user-visible IDs. Things I love about it are all mentioned in Crockford's spec, but I'll restate them here:

- It's case-insensitive allowing data encoded with it to survive going through most random line-of-business applications which may have errant UPPER() or lower() calls somewhere in their depths, as well as making it easy for humans to talk about out-loud (no need to specify case makes it easy)

- Making all the "pillar" shaped characters (I, i, 1, l) equivalent and all the "donut" shaped characters (o, O, 0) equivalent means big swaths of human typing mistakes are avoided. After so much exposure to Crockford Base32, I now loath having to decipher "is it a capitol 'I'? Or a lowercase 'l'?"

Overall, it's a great way to encode any user-facing ID; just make sure you understand that just using this encoding won't stop users from realizing that Crockford Base32 encoded IDs are sequential. If you want sequential IDs to appear non-obviously sequential after encoding you'll need to use additional techniques. Assuming Crockford Base32 obscures sequential IDs is the one case where I saw someone do something they shouldn't as it directly relates to Crockford Base32.

fivea · on Jan 8, 2022

> (...) but it was basically to use this instead

Base32 is a representation format which provides a textual representation of numbers, not a storage format.

I mean, anyone is free to dump Base32, or even base 2, into a string and just run with that, but that would be highly inefficient.

Zamicol · on Jan 8, 2022

Oh wow thanks! I was familiar with Zimmerman's base 32, (z-base-32, https://philzimmermann.com/docs/human-oriented-base-32-encod...) but not Crockford's.

I just added it to the "Other projects" section of my base converter. https://convert.zamicol.com/

hermanradtke · on Jan 8, 2022

ULIDs are base 32 as well. https://github.com/oklog/ulid

gnabgib · on Jan 8, 2022

Is the spec not a better link than a go implementation?

https://github.com/ulid/spec

nostrebored · on Jan 8, 2022

What was the book?

mhoad · on Jan 8, 2022

https://www.bookdepository.com/API-Design-Patterns-JJ-Geewax...

I loved it, think it all made a ton of sense. Lots of code samples, no weird technology choices (normally they would do it all via protobufs and gRPC but they keep the same principles and just use HTTP instead and Typescript for code samples)

jgeewax · on Jan 10, 2022

Glad to hear ! :-)

afhammad · on Jan 8, 2022

https://livebook.manning.com/book/api-design-patterns/chapte...

Section 6.3.3 gets to the point about base32

rco8786 · on Jan 8, 2022

We use crockford tokens heavily where I work, highly recommend them. We typically use a prefix to denote the type of thing being identified, like “U-“ for User followed by a 10-15ish long crockford token. Works great.

wbl · on Jan 8, 2022

Would you happen to have a link to the book?

relueeuler · on Jan 8, 2022

https://www.amazon.com/API-Design-Patterns-JJ-Geewax/dp/1617...

pottertheotter · on Jan 8, 2022

https://www.manning.com/books/api-design-patterns

mattacular · on Jan 8, 2022

Which book?

pottertheotter · on Jan 8, 2022

https://www.manning.com/books/api-design-patterns