as if that were responsible for turning β‘ into TEL. But the fourteenth column in
> 2121;TELEPHONE SIGN;So;0;ON;<compat> 0054 0045 004C;;;;N;T E L SYMBOL;;;;
is empty!
What we're actually seeing is a Compatibility Decomposition, used when Unicode normalisation form NFKC is applied to the text.
Whether it's appropriate for browsers to be applying NFKC may be questionable. RFC 5895 calls for the use of NFC (which would not apply mappings like this), but it also says that
> These form a minimal set of mappings that an application should strongly consider doing. Of course, there are many others that might be done.
How come, they don't seem to resolve to anything? I tried both via my browser and curl.
I can see the records and if i curl the ip-address for the dk-records I only get a nginx 301 redirect loop to the http-s version which serves a certificate for https://eksempel.dk.
Similar exprience with ai, curling the ip seems to point to a http://offshore.ai page.
Is the top level A-records used for some other protocol? Do they server any purpose?
Itβs a little sad that we ended up with punycode, given that utf8 is so elegant as a forward compatible character set with ascii.
DNSβ concern over backward compatibility is a bit of a pain sometimes. And now we even have two competing standards, where multicast DNS, mDNS, allows utf8, but βstandardβ DNS does not.
An important benefit of punycode is that it provides some protection against homograph attacks [0]. There are so many similar-looking characters in Unicode that it seems reasonable to trim the allowed characters to a subset. Of course it's a compromise and ASCII's not perfect, but it's a lot easier to spot g00gle.com compared to gΠΎΠΎgle.com.
At the same time there's sites like flΓΌge.de which is not reachable under any domain except the unicode domain, and while ΓΌ could be written as ue, fluege.de is already owned by a competitor.
Over time, punycode is going to cause more phishing problems in non-ascii countries than it's going to solve, because users aren't going to see a difference between xn-blabla.de and xn-blablu.de if all domains are unreadable to them.
I feel like this is a browser UX problem, right? A browser designed to prevent phishing of readers of both ASCII and non-ASCII languages might display both the punycode and unicode versions of a website, and if a heuristic is detected that a homograph is used that would otherwise result in an Alexa Top 100k site, display a dialog to warn against a phishing attack. (Your flΓΌge.de example shouldn't trigger that warning, for instance.)
https://github.com/phishai/phish-protect is an attempt to do this, but I think there's a better middle ground for international users that doesn't simply block-by-default all punycode domains.
Vivaldi shows domains in punycode by default. I believe this is the only reasonable solution, otherwise browser makers will always be playing catch-up with exploiters.
At that point, how much value is there in supporting Unicode? By only using ascii (punycode), it pretty much eliminates the reason it exists: To allow software to show a domain in someoneβs native language.
Should we perhaps instead be restricting the domain characters to the glyphs of ascii (rfc1035 compliant) and those glyphs that appear in their locale? Otherwise revert to punycode when the glyphs fall outside those ranges.
> At that point, how much value is there in supporting Unicode? By only using ascii (punycode), it pretty much eliminates the reason it exists: To allow software to show a domain in someoneβs native language.
Allowing a user to enter a domain in their native language is very much worthwhile, I'd say, even if we revert to ascii for display.
> map characters to the "Simple_Lowercase_Mapping" property (the fourteenth column) in <http://www.unicode.org/Public/UNIDATA/UnicodeData.txt>, if any.
as if that were responsible for turning β‘ into TEL. But the fourteenth column in
> 2121;TELEPHONE SIGN;So;0;ON;<compat> 0054 0045 004C;;;;N;T E L SYMBOL;;;;
is empty!
What we're actually seeing is a Compatibility Decomposition, used when Unicode normalisation form NFKC is applied to the text.
Whether it's appropriate for browsers to be applying NFKC may be questionable. RFC 5895 calls for the use of NFC (which would not apply mappings like this), but it also says that
> These form a minimal set of mappings that an application should strongly consider doing. Of course, there are many others that might be done.
which leaves things rather open.