Hacker News new | past | comments | ask | show | jobs | submit login
CSS for Internationalisation (chenhuijing.com)
165 points by polm23 on April 25, 2020 | hide | past | favorite | 37 comments



Glad to see vertical writing covered. Wish it covered Ruby styling[0][1] and markup[2] as well.

[0] MDN reference: https://developer.mozilla.org/en-US/docs/Web/CSS/ruby-positi...

[1] W3C in-depth article: https://w3c.github.io/i18n-drafts/articles/ruby/styling.en.h...

[2] https://www.w3.org/International/articles/ruby/markup.en


The author covers it in another article (2016): https://www.chenhuijing.com/blog/html-ruby/#%F0%9F%9A%B2


Thanks for linking! Vertical writing on the web is of huge interest to me.


Sorry to be nitpicky, but this is wrong:

> Have you ever wondered how Chrome knows to ask you if you’d like a web page’s content to be translated? No? Okay, maybe it’s just me then. But it’s because of the lang attribute on the <html> element.

The lang attribute is a signal to whatever's inside Chrome that detects language. Here's a simple counter example: https://twitter.com/chewxy/status/1253076770066010112

The image is an SVG generated by graphviz. With no lang attribute (or even a HTML tag) Still Chrome thinks it's Luxembourgish.

Nonetheless, I appreciate the tip on the pseudoclass for lang.


Does Chrome ever ignore the lang attribute?


Yes [0]:

> Chrome uses CLD3 to perform language detection on all webpages. This language detection is generally very accurate. Chrome will override the language detection if there is a language attribute on the HTML tag or a language specified in the HTTP content-language header.

> However, both these signals are often incorrectly specified by the site/page, and in particular many non-English sites report the language as English (presumably based on default values from authoring tools, etc.) Currently a whitelist is used to determine if the detected language should always override the language attribute / content-language.

Compact Language Detector v3 (CLD3) is a neural network model for language identification [1]. Sometimes it fails [2].

[0] https://bugs.chromium.org/p/chromium/issues/detail?id=771861

[1] https://github.com/google/cld3

[2] https://bugs.chromium.org/p/chromium/issues/detail?id=105923...


I don't know if it is the case nowadays, but I seem to recall that in the old days they would check and if their model was highly predicting a language different than the one you were claiming they would serve the language they predicted.

Which, even if they don't still do this, I think makes total sense and is what I would do because obviously programmers make mistakes and if your page claims to be English but your language has a high chance of being Armenian and a low chance of being English I would consider it was Armenian.


Google crawler apparently ignores "lang" attribute [0]:

> Google uses the visible content of your page to determine its language. We don’t use any code-level language information such as lang attributes, or the URL.

But what about the Chrome browser?

[0] https://support.google.com/webmasters/answer/182192?hl=en


Yep, there are also some interesting notes on this here https://superuser.com/questions/303706/how-does-chrome-know-...


Umm, so they weren't wrong:

> Chrome first checks the HTML lang attribute and if it's not present it checks the Content-Language HTTP header. Then it gets a prediction from cld3.


You can set the language in the HTTP header on your server for the entire site - if that applies.


I learned a lot from the content in case I need to do more localization/i18n for future.

Nice easter egg, first time seeing emoji used in the url fragment. It also loads a new one.

Edit: As an idea to share, I wonder given the topic, whether different emoji's can be localized too. An emoji can mean something different depending on country.


One thing that’s been omitted is the :dir() pseudoclass[1] which has been inexplicably ignored by Webkit and Blink. Yeah, it can sort of be replicated by a descendant selector, but it’d be so much more obvious and selfcontained to select an element based on its own calculated text direction; something that’s currently only possible in Firefox.

[1] https://www.w3.org/TR/selectors/#the-dir-pseudo


The article touches on having properties like borders and margins that accommodate the language, but all the examples are manually calculated. I recently saw a talk on Youtube that mentioned that there is (or is coming?) support for margin-start/end kind of syntax that will allow browsers to handle re-orientation of box properties depending on the language. Sadly, can't find it. It was by a pair of people from the Chrome and Edge teams updating on some new features coming. Obviously browser support and awareness will take time to normalize these patterns, but it will help remove the need for many of these manual considerations, which means that support for internationalization should improve in the coming years.


You’re thinking of CSS Logical Properties, and they are super cool: https://developer.mozilla.org/en-US/docs/Web/CSS/CSS_Logical...


I’m confused. The feature you describe is logical properties, which is exactly what the section you’re talking about is about. I don’t know what you mean by “manually calculated”.


Indeed! I guess I was confused by the organization of the example[0] for that section. The physical section being first obscures all the examples of the logical block style so all you see is variations of:

  border-top-color: tomato;
  border-right-color: limegreen;
  border-bottom-color: dodgerblue;
  border-left-color: gold;
Looking back, my comment was more due to my own inability to visualize the simplicity of the logical properties as presented vs. what I've seen in other demonstrations and realizing that I lost sight of a resource that I think would be a helpful reference for myself as this new method becomes commonplace.

[0]: https://codepen.io/huijing/pen/XWmKByZ


I was actually expecting to see i18n translations with lang. Something like:

  .i18n-hello:lang(:en) {
    content: "Hello";
  }
  
  .i18n-hello:lang(:nl) {
    content: "Hallo";
    }
  
  <span class="i18n-hello">hello label</span>

That doesn't work, so you'd have to use ::after, but that requires some more verbose styling.


But this mixes content and style, so would be difficult to update (for non technical users etc.). Besides, it's fairly bad practice and won't scale to sentences well.


> Have you ever wondered how Chrome knows to ask you if you’d like a web page’s content to be translated?

How difficult is it to automatically detect the language? Probably naive but how far can you get by counting how many common words on the page are from a particular language (e.g. "the of then because I" for English and "je pour des les" for French)?


Google translate already has an auto-detect language feature.

I suspect it would be reasonably simple to add a "Translate this" button somewhere in Chrome (perhaps it's already there, I don't have Chrome installed on this machine.)

Perhaps automatically doing it for every site would be a little bit intrusive on the privacy front.


> Google translate already has an auto-detect language feature.

Yep, I'm asking how hard it is to build a quick and simple auto-detect feature. Is it harder than it looks?


It kind of depends what you mean by build. From scratch - I have no idea. For Google Chrome? It's just an API call away - https://cloud.google.com/translate/docs/basic/detecting-lang...


As a rule of thumb, everything in NLP is harder than it looks.


https://github.com/google/cld3

build requires Chromium, not so hard


I just wish CSS offered a simple way to fix the aspect ratio of an arbitrary element.


I have used a hack for this for years. And it seems to be very unknown.

If you want an element to have the aspect ratio 16:9 give it a parent div with width: 100%; height: 0; padding-bottom: 56.25%; position: relative;

then just absolutely position the child element with width 100%; height:100%;

This works everywhere, even in IE8. I am happy I will not have to resort to this anymore in the future, but it will do for now.


It’s pretty common. For example, Bulma.io framework uses it for the .image.is16-9 etc tags.


Ah good to know. I don't use CSS frameworks (except Tailwind recently), but that sounds handy.


That's coming! It's not implemented in any browsers yet, but it should be soon. It was introduced in "CSS Intrinsic & Extrinsic Sizing Module Level 4".

It will use the property "aspect-ratio" with values like "16/9" or "1/1".

MDN: https://developer.mozilla.org/en-US/docs/Web/CSS/aspect-rati...

Article about it by Rachel Andrew in Smashing Magazine: https://www.smashingmagazine.com/2019/03/aspect-ratio-unit-c...


You can set the width and height attribute (HTML attribute, not CSS property) today.

Recent Chrome and Firefox use that to determine aspect ratio for jank-free loading.


For images and iframes, but I don't think that works for any element. In any case, hardcoding that kind of information in the HTML is not ideal.


That’s Grrrreat!


Very good writing, and very informative. I especially enjoyed the gif of traditional vs. simplified Han.


In modern browsers, whether or not lang is provided, the browser is smart enough to set the correct encoding / glyphs.

In the olden times, browser was not smart (ie: still set the wrong charset). You may have come across for example, an asian site (japanese, korean, chinese), and see a lot of text with "?"'s and "□"'s like so ??? □□□ in 90's and common still through beginning 2000's. Glad these days you don't have to spend time trying to match it up, but it was available readily in browser's menu for you to match.

I think here, the lang tag is useful so you can explicitly tell the browser. And if you are designing the page, you have more control for localization to target by the international code.


I think you are confusing language (English, French, Arabic, etc) with encoding (ASCII, UTF-8, UTF-16, Latin1, etc).

You do sometimes see mojibake in web pages, (question marks and □□ in place of the real text). These are caused by incorrect encodings. The web server tells the browser which encoding to use using the HTTP header Content-Type or using the <meta charset="UTF-8"> HTML element. You should always set the encoding rather relying on the browser guessing.


I was referencing asian languages as one set of examples for encoding. Of course setting the encoding is important (whether html or within a program dealing with strings - I had my share of hard bugs only to realize just that, another story)!

Here I was just sharing my encounters of the browser rendering ?? and □□ character marks, because it did not know. The browser had a "Auto Detect" charset mode, so one always had to toggle (and remember to revert back again when viewing another page). For the very reason you have indicated, "should always set", but in those days (and maybe even today), not always set.

Again, just a memory sharing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: