Hacker News new | past | comments | ask | show | jobs | submit login

I dont get this actually.

Say a UTF8 string is ae 31 c1 12.

Now how do we decide whether it has the characters "31","c1","ae","12" or the characters are "ae 31" and "c1 12" or even "ae","31 c1" and "12".??

EDIT: Never mind!..found my answer here http://stackoverflow.com/questions/1543613/how-does-utf-8-va...




The tldr is that UTF-8 is a prefix code: no valid character is a prefix of any other.

http://en.wikipedia.org/wiki/Prefix_code




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: