Understand they removed '0' & 'O' but why did they retain 'o' as it could be con...

sjnu · on Dec 3, 2020

It's safe to leave behind one of each group (1 is still there).

quesera · on Dec 3, 2020

That's true -- the parser can just translate any lookalike errors into the canonical character.

But I wish the standard was to keep all digits and drop the lookalike letters. I can't think of a reason for the extra (human) burden of remembering whether it's 0 over O, or I over 1, etc.

nicky0 · on Dec 3, 2020

The decoder can simply replace 0 and O with o. Human error will thus be corrected and you can relax and use any of them.

quesera · on Dec 3, 2020

I'm thinking more from the parser side.

Being liberal in what you accept is fine, but being strict in what you emit requires extra mental overhead.

It's nothing cataclysmic (and "on the developer" is a often a reasonable place to add the extra overhead) but in this case it seems like selecting 0 instead of O or o, and 1 instead of I or l, would have been more straightforward.

Selecting "1" (ASCII 49), and "o" (ASCII 111) is inconsistent and feels odd.

EDIT: OK, I can think of one possible justification: with zero disallowed, a leading zero can never be lost. In a short Base58 string, it's not unusual for all characters to be numeric, and some overzealous readers will interpret the whole value as an integer. See also: hex strings, US ZIP codes, ABA routing numbers, etc, vs Microsoft Excel. :)

EDIT2: Nevermind, that's a lame justification. The likelihood of a Base58 string of any useful length containing numeric chars only is very low, since the alphabet would be 17.2% numeric) unlike hex strings (61.5% numeric), or ZIP codes and ABA numbers (100% numeric). Additionally, IIRC Base58 was invented for BTC addresses, which necessarily started (at the time) with a "1" char (now "1", "3", or "bc1").

nicky0 · on Dec 16, 2020

I think you're missing the fundamental point that the pair 0 O is very easily confused, but o O and o 0 are less easily confused, by virtue of differing height.

quesera · on Dec 16, 2020

I do understand, but that's only true for [0Oo], and does not help with [1Il]. In either case, the risk of human confusion is removed by transliteration in parser code.

But on the parser implementation side, I need to remember that numeric 0 maps to lowercase o, but lowercase l maps to numeric 1.

My thought is that it would be simpler to only remember one rule: numeric chars are canonical, lookalikes are dropped.

Clearly, not a big deal -- you write it once, correctly, and move on (or try to!). That "once" has apparently stuck in my head, and now a few years later I still wonder about the reason for the design choice. :)

tosh · on Dec 3, 2020

good point, I can see value in an alphabet that omits all of them (e.g. for when you encounter an id in the wild and don't know its encoding)