A codepoint is the smallest unit of meaning in unicode. A byte is just a number,...

A codepoint is the smallest unit of meaning in unicode. A byte is just a number, that might (or might not) have meaning in a specific unicode encoding. (Also depending on what other bytes it's next to).

A codepoint is the smallest unit that has a graphical representation you can print on screen.

A codepoint is the smallest unit that allows API's that are agnostic to encoding, just in terms of the semantic content of unicode. If you want to write any kind of algorithm in terms of the actual character meaning (charecters represented), you want a codepoint abstraction. Most unicode algorithms -- like for collation, normalization, regexp character classes -- are in terms of codepoints.

If you split a unicode string on codepoints, the results are always valid unicode strings. If you split a unicode string on bytes, they may not be.

Human written language is complicated. Unicode actually does a pretty amazing job of providing an abstraction for dealing with it, but it's still complicated. It's true that it would be a (common) misconception to think that a codepoint always represents "one block on the screen", a "user-perceived character", (a "grapheme cluster"). If you start really getting into it, you realize "a user-perceived character" is a more complex concept than you thought/would like; not because of unicode but because of the complexities of global written human language and what software wants to do with it. But most people who have tried writing internationalized text manipulation of any kind with an API that is only in terms of bytes -- will know that codepoints is definitely superior.

If you do need "user-perceived characters" aka "grapheme clusters" -- unicode has an algorithm for that, based on data for each codepoint in the unicode database. https://unicode.org/reports/tr29/ It can be locale-dependent (whereas codepoints are locale independent). And guess what, the algorithm is in terms of codepoints -- if you wanted to implement the algorithm, you would usually want an API based on a codepoint abstraction to start with.

The "grapheme cluster" abstraction is necessarily more expensive to deal with than the "codepoint" abstraction (which is itself necessarily more expensive than "bytes") -- "codepoint" is quite often the right balance. I suppose if computers were or got another couple of magnitudes faster, we might all want/demand more widespread implementation of "grapheme cluster" as the abstraction for many more things -- but it'd still be described and usually implemented in terms of the "codepoint" abstraction, and you'd still need the codepoint abstraction for many things, such as normalization. But yes, it would be nice if more platforms/libraries provided "grapheme cluster" abstraction too. But it turns out you can mostly get by with "codepoint". You can't really even get by with just bytes if you want to do any kind of text manipulation or analysis (such as regexp). And codepoint is the abstraction on which "grapheme cluster" is built, it's the lower level and simpler abstraction, so is the first step -- and some platforms have only barely gotten there. A "grapheme cluster" is made up of codepoints.

I suppose one could imagine some system that isn't unicode that doesn't use a "codepoint" abstraction but somehow only had "user-perceived characters"... but it would get pretty crazy, for a variety of reasons including but not limited to that "user-perceived character" can be locale-dependent. "codepoint" is a very good and useful abstraction, and is the primary building block of unicode, so it makes sense that unicode-aware platform APIs also use it as a fundamental unit. A codepoint is the unit on which you can look up metadata in the unicode database, for normalization, upper/lowercasing, character classes for regexps, collation (sort order), etc. Unicode is designed to let you do an awful lot with codepoints, in as performant a manner as unicode could figure out.