Hacker News new | past | comments | ask | show | jobs | submit login

That article starts out OK and then suddenly tries to argue that you can use the terms interchangeably. You can not and you will drown in confusion if you try to. Just imagine that tomorrow, the Chinese introduce their own character set next to Unicode, but use UTF-8 to minimize the number of bytes it takes to represent their language (which makes sense, because the frequency of characters drops off pretty fast and some characters are much more common than others, so you'd like to represent those with one byte).

The fact that the HTTP RFC speaks of 'charset=utf-8' is explained by this part of the spec:

  Note: This use of the term "character set" is more commonly
  referred to as a "character encoding." However, since HTTP and
  MIME share the same registry, it is important that the terminology also be shared.
Why does MIME use the 'wrong' terminology? Perhaps because the registry is old and the difference between set and encoding was less obvious and relevant back then. Perhaps it was simply a mistake; a detail meant to be corrected. Perhaps the person that drew it up was inept. Who knows. It doesn't matter, it is still wrong. And don't get me started on the use of character set in MySql...



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: