In CS, most algorithms assume an ASCII character set. I wonder if there's any string-related algorithms that completely break (functionally or complexity wise) when given UTF-16 or UTF-8 character sets
Asymtotic complexity can't change based on the character set, since you can just reuse the same algorithm with larger opaque datums. (Exception being algorithms with O(n^8) or O(n^256) complexity, but noone uses those anyway.)
A variable width encoding can cause issues in principle, but useful algorithms already have to deal with strings that have variable-length physical represention anyway (eg "yes" vs "no"), so it tends not to be a problem in practice.
> In CS, most algorithms assume an ASCII character set.
They most certainly do not. E.g., a Turing machine assumes an alphabet Γ which is a set of some characters and is defined no further, as any exact definition is meaningless to the theory. (I.e., the algorithm is generic over any alphabet.) The alphabet need not even be text; e.g., for a Turing machine, the set of all octets suffices.
Even for something like Levenshtein distance, the only real requirement of the algorithm is that the abstract "characters" implement equality testing. For Unicode text, I'd start with graphemes, and then look for counter examples.
I guess they won't break correctness but I do remember many algorithms (e.g. tries) assume you have constant random access to characters which AFAIK is not possible in UTF-8