|
Internationalized domain name
Domaining Guide
Internationalized domain name
Home | Up
An internationalized domain name (IDN) is an
Internet
domain name that (potentially) contains non-ASCII characters. Such
domain names could contain letters with diacritics, as required by many
European languages, or characters from non-Latin scripts such as Arabic
or Chinese. However, the standard for domain names does not allow such
characters, and much work has gone into finding a way around this,
either by changing the standard, or by agreeing on a way to convert
internationalized domain names into standard ASCII domain names while
preserving the stability of the domain name system.
IDN has, by the standards of the Internet, a long history; it was originally
proposed in 1996 (by M. Duerst) and implemented in 1998 (by T.W.Tan et al).
After much debate and many competing proposals, a system called
Internationalizing Domain Names in Applications (IDNA) was adopted as
the chosen standard, and is currently,
as of 2005,
in the process of being rolled out.
In IDNA, the term internationalized domain name means specifically any
domain name consisting only of labels to which the IDNA ToASCII algorithm can be
successfully applied. (For the meaning of 'label' and 'ToASCII', see the section
ToASCII and ToUnicode below.)
Internationalizing domain names in applications
Internationalizing Domain Names in Applications (IDNA) is a mechanism defined
in 2003 for handling internationalized domain names containing non-ASCII
characters. Such
domain
names could not be handled by the existing
DNS and name resolver infrastructure. Rather than redesigning the existing
DNS infrastructure, it was decided that non-ASCII domain names should be
converted to a suitable ASCII-based form by web
browsers and other user applications; IDNA specifies how this conversion is
to be done.
IDNA was designed for maximum
backward compatibility with the existing DNS system, which was designed for
use with names using only a subset of the ASCII character set.
An IDNA-enabled application is able to convert between the restricted-ASCII
and non-ASCII representations of a domain, using the ASCII form in cases where
it is needed (such as for DNS lookup), but being able to present the more
readable non-ASCII form to users. Applications that do not support IDNA will not
be able to handle domain names with non-ASCII characters, but will still be able
to access such domains if given the (usually rather cryptic) ASCII equivalent.
ICANN issued
guidelines for the use of IDNA in June 2003, and it was already possible to
register .jp domains using this system in July 2003. Several other
top-level domain registries started accepting registrations in March 2004.
Mozilla 1.4, Netscape 7.1, Opera 7.11 and Safari are among the first
applications to support IDNA. A browser plugin is available for Internet
Explorer 6 to provide IDN support. Internet Explorer 7.0 and Windows Vista's URL APIs provide native support for IDN
[1].
ToASCII and ToUnicode
The conversions between ASCII and non-ASCII forms of a domain name are
accomplished by algorithms called ToASCII and ToUnicode. These algorithms are
not applied to the domain name as a whole, but rather to individual labels. For
example, if the domain name is www.example.com, then the labels are www, example
and com, and ToASCII or ToUnicode would be applied to each of these three
separately.
The details of these two algorithms are complex, and are specified in the
RFCs linked at the end of this article. The following gives an overview of
their behaviour.
ToASCII leaves unchanged any ASCII label, but will fail if the label is
unsuitable for DNS. If given a label containing at least one non-ASCII
character, ToASCII will apply the Nameprep algorithm (which converts the label
to lowercase and performs other normalization) and will then translate the
result to ASCII using Punycode before prepending the 4-character string "xn--". This 4-character string is
called the ACE prefix, where ACE means ASCII Compatible Encoding, and is
used to distinguish Punycode-encoded labels from ordinary ASCII labels. Note
that the ToASCII algorithm can fail in a number of ways; for example, the final
string could exceed the 63-character limit for the DNS. A label on which ToASCII
fails cannot be used in an internationalized domain name.
ToUnicode reverses the action of ToASCII, stripping off the ACE prefix and
applying the Punycode decode algorithm. It does not reverse the Nameprep
processing, since that is merely a normalization and is by nature irreversible.
Unlike ToASCII, ToUnicode always succeeds, because it simply returns the
original string if decoding would fail. In particular, this means that ToUnicode
has no effect on a string that does not begin with the ACE prefix.
Example of IDNA encoding
As an example of how IDNA works, suppose the domain to be encoded is
Bücher.ch (“Bücher” is German for “books”, and .ch is the
country domain for
Switzerland). This has two labels, Bücher and ch. The second
label is pure ASCII, and so is left unchanged. The first label is processed by
Nameprep to give bücher, and then by Punycode to give bcher-kva,
and then has xn-- prepended to give xn--bcher-kva. The final
domain suitable for use with the DNS is therefore xn--bcher-kva.ch.
ASCII Spoofing and squatting concerns
Because IDN allows websites to use full Unicode names, it also makes it much
easier to create a spoofed web site that looks exactly like another, including
domain name and security certificate, but in fact is controlled by someone
attempting to steal private information. These spoofing attacks potentially open
users up to phishing attacks.
These attacks are not due to technical deficiencies in either the Unicode or
IDNA specifications, but due to the fact that different characters in different
languages can look the same, depending on the font used. For example, Unicode
character U+0430, Cyrillic small letter a ("а"), can look identical to Unicode
character U+0061, Latin small letter a, ("a") which is the lowercase "a" used in
English. Characters that look alike in this way may be termed homonyms,
homographs, or (less ambiguously)
homoglyphs.
Although a computer may display visually identical or very similar
glyphs for two
different characters, these differences are still significant to the computer
when locating web sites or validating certificates. The user assumes a
one-to-one correspondence between the visual appearance of a name and the named
entity, but when two names appear identical, this correspondence breaks down.
By contrast, with the old set of a to z, 0 to 9, and the hyphen, there is
little in the way of homographs. l and 1 and 0 and o are the closest, and the
combination "rn" looks similar to "m" in some fonts; however, most fonts make a
noticeable visible distinction between them. Still, this means even in the worst
case a site like Google would still only need to register 8 names to protect
against the homograph attacks.
On December 2001, two Israeli researchers, Evgeniy Gabrilovich and Alex
Gontmakher, published a paper titled "The Homograph Attack",[1]
an attack that used Unicode URLs to spoof a website URL. To prove the
feasibility of this kind of attack, the researchers successfully registered a
variant of the domain name "Microsoft.com" which incorporated Russian language
characters.
In general, this kind of attack is known as a
homograph spoofing attack. This problem was anticipated before IDN was
introduced, and guidelines were issued to registries to try and avoid or reduce
the problem -- for example, recommending that registries only accept the Latin
alphabet and that of their own country, not all of Unicode. Unfortunately this
advice was not followed by those in control of a number of major TLDs.
On February 7, 2005, Slashdot reported that this exploit was disclosed at the
hacker conference Shmoocon with an example available at
http://www.shmoo.com/idn/. On browsers supporting IDNA, the URL "http://www.pаypal.com/"
(where the first a is replaced by a Cyrillic а) appears to lead to
paypal.com but instead lead to a spoofed PayPal web site
that said "Meeow."
Internet Explorer 7 imposes restrictions on displaying non-ASCII domain
names based on a user-defined list of allowed languages and provide an
anti-phishing filter that checks suspicious Web sites against a remote database
of known phishing sites.
Since Internet Explorer prior to version 7 does not support IDNs, it is not
vulnerable to this kind of attacks. However, older versions of Internet Explorer
can be made IDN-compatible by browser plug-ins some of which are vulnerable to
the spoofing attacks. On July 9, 2005, the IDN-enabling plug-in Quero Toolbar 2.1.0 was released that implemented several
anti-spoofing techniques like mixed-script detection and highlighting of
characters belonging to different scripts.
On February 17, 2005, Mozilla developers announced that they would ship their
next versions of their software with IDN support still enabled, but showing the
punycode URLs instead, thus thwarting any attacks exploiting similarities between ASCII
and non-ASCII letters (but not necessarily, for example, between Cyrillic and
Greek letters, unless the user knows which Punycode URL corresponds to their
chosen IDN URL) while still allowing people to access websites on an IDN domain.
This is a change from the earlier plans to disable IDN entirely for the time
being.
[2]
Since then, both Mozilla and Opera have now announced that they will be using
per-domain whitelists to selectively switch on IDN display for domain run by
registries which are taking appropriate anti-spoofing precautions[3].
(See the article on homograph spoofing attacks for more details). As of
September 9, 2005, the most recent version of Mozilla Firefox as well as the most recent Internet Explorer displays the
spoofed Paypal URL as "http://www.xn--pypal-4ve.com/",
unsightly but clearly different from the actual
paypal.com.
Safari's approach is to render problematic character sets as punycode. This can
be changed by altering the settings in Safari's system preference files.
History of IDN
- 12/96: Martin Duerst's original Internet Draft proposing UTF5 (the first
incarnation of what is known today as ACE) - UTF-5 was first defined by
Martin Duerst at the University of Zürich in
[4][5][6]
- 03/98: Early Research on IDN at National University of Singapore (NUS),
Center for Internet Research (formerly Internet Research and Development
Unit - IRDU) led by Prof. Tan Tin Wee (IDN Project team - Lim Juay Kwang and
Leong Kok Yong) and subsequently continued under a team at Bioinformatrix
Pte. Ltd. (BIX Pte. Ltd.) - a NUS spin-off company led by Prof. S. Subbiah.
- July 98: Geneva INET'98 conference with a BoF discussion on iDNS and
APNG General Meeting and Working Group meeting.
- 07/98: Asia Pacific Networking Group (APNG, now still in existence
[7] and distinct from a gathering known as
APSTAR
[8]) iDNS Working Group formed.
[9]
- 10/98:
James
Seng was recruited to lead further IDN development at BIX Pte. Ltd. by
Prof. S. Subbiah.
- 02/99: iDNS Testbed launched by BIX Pte. Ltd. under the auspicies of
APNG with participation from CNNIC, JPNIC, KRNIC, TWNIC, THNIC, HKNIC and SGNIC led by James
Seng
[10]
- 02/99: Presentation of Report on IDN at Joint APNG-APTLD meeting, at
APRICOT'99
- 03/99: Endorsement of the IDN Report at APNG General Meeting 1 March
1999.
- 06/99: Grant application by APNG jointly with the Centre for Internet
Research (CIR), National University of Singapore, to the International
Development Research Center (IDRC), a Canadian Government funded
international organisation to work on IDN for IPv6. This APNG Project was
funded under the Pan Asia R&D Grant administered on behalf of IDRC by the
Canadian Committee on Occupational Health and Safety (CCOHS). Principal
Investigator: Tan Tin Wee of National University of Singapore.
[11]
- 07/99 Tout, Walid R. (WALID Inc.) Filed IDNA patent application number
US1999000358043 Method and system for internationalizing domain names.
Published 2001-01-30
[12]
- 07/99:
[13]; Renewed 2000
[14] Internet Draft on UTF5 by James Seng, Martin Duerst and Tan Tin
Wee.
- 08/99: APTLD and APNG forms a
working group to look into IDN issues chaired by Kilnam Chon.
[15]
- 10/99: BIX Pte. Ltd. and National University of Singapore together with
New York Venture Capital investors, General Atlantic Partners, spun-off the
IDN effort into 2 new Singapore companies - i-DNS.net International Inc. and
i-Email.net Pte. Ltd. that created the first commercial implementation of an
IDN Solution for both domain names and IDN email addresses respectively.
- 11/99: IETF
IDN Birds-of-Feather in Washington was initiated by i-DNS.net at the request
of IETF officials.
- 12/99: i-DNS.net InternationalPte. Ltd. launched the first commercial
IDN. It was in Taiwan and in Chinese characters under the top-level IDN TLD
".gongsi" (meaning loosely ".com") with endorsement by the Minister of
Communications of Taiwan and some major Taiwanese ISPs with reports of over
200 000 names sold in a week in Taiwan, Hong Kong, Singapore, Malaysia,
China, Australia and USA. Requires use of either plug-in or special DNS
hacks.
- Late 1999: Kilnam Chon initiates Task Force on IDNS which led to
formation of MINC, the Multilingual Internet Names Consortium.
[16]
- 01/2000: IETF
IDN Working Group formed chaired by James Seng and Marc Blanchet
- 01/2000: The second ever commercial IDN launch was IDN TLDs in the Tamil
Language, corresponding to .com, .net, .org, and .edu. These were launched
in India with IT Ministry support by i-DNS.net International. Requires use
of either plug-in or special DNS hacks.
- 02/2000: Multilingual Internet Names Consortium(MINC) Proposal BoF at
IETF Adelaide.
[17]
- 03/2000: APRICOT 2000 Multilingual DNS session
[18]
- 04/2000: WALID Inc. (with IDNA patent pending application 6182148)
started Registration & Resolving Multilingual Domain Names.
- 05/2000: Interoperability Testing WG, MINC meeting. San Francisco,
chaired by Bill Manning and Y.Yoneya 12 May 2000.
[19]
- 06/2000: Inaugural Launch of the Multilingual Internet Names Consortium
(MINC) in Seoul
[20] to drive the collaborative roll-out of IDN starting from the Asia
Pacific.
[21]
- 07/2000: Joint Engineering TaskForce (JET) initiated in Yokohama to
study technical issues led by JPNIC (K.Konishi)
- 07/2000: Official Formation of CDNC Chinese Domain Name Consortium to
resolve issues related to and to deploy Han Character domain names, founded
by CNNIC, TWNIC, HKNIC and MONIC in May 2000.
[22]
[23]
- 03/01: ICANN
Board IDN Working Group formed
- 07/01: Japanese Domain Name Association : JDNA Lauch Ceremony (July 13,
2001) in Tokyo, Japan.
- 07/01: Urdu Internet Names System (July 28, 2001) in Islamabad,
Pakistan, Organised Jointly by SDNP and MINC.
[24]
- 07/01: Presentation on IDN to the Committee Meeting of the Computer
Science and Telecommunications Board, National Academies USA (JULY 11-13,
2001) at University of California School of Information Management and
Systems, Berkeley, CA.
[25]
- 08/01: MINC presentation and outreach at the Asia Pacific Advanced
Network annual conference, Penang, Malaysia 20th August 2001
- 10/01: Joint MINC-CDNC Meeting in Beijing 18-20 October 2001
- 11/01: ICANN
IDN Committee formed
- 12/01: Joint ITU-WIPO Symposium on Multilingual Domain Names organised
in association with MINC, 6-7 Dec 2001, International Conference Center,
Geneva.
- 01/03: Free implementation of StringPrep, Punycode, and IDNA release in
GNU Libidn.
- 03/03: Publication of
RFC 3454,
RFC 3490,
RFC 3491 and
RFC 3492
- 06/03: Publication of
ICANN IDN Guidelines for registries Adopted by .cn, .info, .jp, .org,
and .tw registries.
- 05/04: Publication of
RFC 3743, Joint Engineering Team (JET) Guidelines for Internationalized
Domain Names (IDN) Registration and Administration for Chinese, Japanese,
and Korean
- 03/05: First Study Group 17 of ITU-T meeting on Internationalized Domain
Names
[26]
- 05/05: .IN ccTLD (India) creates expert IDN Working Group to create
solutions for 22 official languages
- 04/06: ITU Study Group 17 meeting in Korea gave final approval to the
Question on Internationalized Domain Names
[27]
- 06/06: Workshop on IDN at ICANN meeting at Marrakech, Morocco
- 11/06: ICANN GNSO IDN Working Group created to discuss policy
implications of IDN TLDs. Ram Mohan
elected Chair of the IDN Working Group.
- 12/06: ICANN meeting at São Paulo discusses status of lab tests of IDNs
within the root.
- 01/07: Tamil and Malayalam variant table work completed by India's C-DAC
and Afilias
- 03/07: ICANN GNSO IDN Working Group completes work, Ram Mohan
presents report at ICANN Lisboa meeting.
[28]
DNS registries known to have adopted IDNA
- .ac: see
details
- .ae
- .at: see
details
- .biz:
NeuLevel/NeuStar supports Chinese, Danish, German, Icelandic, Japanese,
Norwegian, Spanish, Swedish IDN in .biz, see
details
- .br: (May 9, 2005) for Portuguese names, see
details
- .cat: (February 14, 2006) for Catalan names, see
details
- .com: see
details
- .ch: (March 1, 2004)
- .cl: (September 21, 2005), see
details
- .cn: see
CNNIC's web site
- .de: (March 1, 2004), see
details
- .dk: (January 1, 2004),
(æ, ø, å, ö, ä, ü, & é), see
details
- .fi: (September 1, 2005), see
details
- .gr: (July 4, 2005) for Greek names, see
details
- .hk: (March 8, 2007) for
Chinese characters, see
details
- .hu
- .info: (March 19, 2004) see
details
- .io: see
details
- .ir: see
more details
- .is: (July 1, 2004)
see
details
- .jp: (July 2003), for
Japanese characters (Kanji, hiragana & katakana)
- .kr: (August 2003), for
Korean characters
- .li: (March 1, 2004)
- .lt: (March 30, 2003), (ą, č, ę, ė, į, š, ų, ū, ž), see
details
- .lv: (2004), see
details
- .museum: (January 20, 2004),
see
details
- .net: see
details
- .no: (February 9, 2004),
see
details
- .nu: see
details
- .org: (January 18, 2005),
see
details
- .pl: (September 11, 2003),
see
details
- .pt: (July 1, 2005) for Portuguese characters
- .se: (October 2003), for
Swedish characters
- .sh: see
details
- .tm: see
details
- .tr: (November 14, 2006), see
nic.tr website for details
- .tw: Traditional Chinese characters, see
TWNIC's site
- .vn:
Vietnamese, see
character list
Non-IDNA or non-ICANN registries that support
non-ASCII domain names
There are other registries that support non-ASCII domain names; a Singapore
company called I-DNS, also proposes via an own registrar network generic domain
name registrations in various languages, but the country codes at the end of the
domain names are also transcripted into the same characters as the domain names.
The company ThaiURL.com in Thailand supports .com registrations via its own
modified DNS, ThaiURL.
Because these companies, and other organizations that offer modified DNS
systems, do not subject themselves to ICANN's control, they must be regarded as
alternate DNS roots. Domains registered with them will therefore not be
supported by most Internet Service Providers, and as a result most users will not be able to
look up such domains without manually configuring their computers to use the
alternate DNS.
At ICANN's December meeting at São Paulo, IDNs were discussed in depth. ICANN
has continued lab tests of IDNs within the root to implement the true IDN top
level domains (IDN.IDN).
References
-
^
http://www.cs.technion.ac.il/~gabr/papers/homograph_full.pdf
2. GNSO IDN Working Group Outcomes Paper:
http://gnso.icann.org/correspondence/gnso-idn-wg-outcomes-ram.pdf
External links
-
RFC 3454 (Stringprep)
-
RFC 3490 (IDNA)
-
RFC 3491 (Nameprep)
-
RFC 3492 (Punycode)
-
ICANN Guidelines for the Implementation of Internationalized Domain Names
-
IANA Repository of TLD IDN Practices
-
Internet Mail Consortium IDNA test tool (includes Perl source code)
-
Online Punycode/IDN Decoder/Encoder
-
Online StringPrep/Punycode/IDNA encoder/decoder (supports Latin-1,
UTF-8, KOI-8, 2022-JP, etc)
-
List of all applications which have implemented IDNA along with a list of
open source SDKs
-
IANA e-mails explaining the final choice of ACE prefix
-
GNU Libidn is an implementation of IDNA
-
Unicode Technical Report #36 - Security Considerations for the
Implementation of Unicode and Related Technology
-
ICANN Internationalized Domain Names.
-
IDN Language Table Registry
-
The Homograph Attack, Evgeniy Gabrilovich and Alex Gontmakher,
Communications of the ACM, 45(2):128, February 2002
-
MINC - Multilingual Internet Names Consortium - Advocates use of IDN's
worldwide.
-
ISC IDN-OSS project Open Source EchIDNA plug-in for IE5/6
-
IDNSearch.net IDN Example URLs
-
IDNcyclopedia IDN News and Resources
-
DNlocal News and discussion on all aspects of IDNs
-
idnf.ru IDN news, discussions, appraisals and trading in russian
Home | Up | Domain name | Hostname | Fully qualified domain name | Internationalized domain name | Subdomain | Domain name registry | WHOIS
Domaining Guide, made by MultiMedia | Websites for sale
This guide is licensed under the GNU
Free Documentation License. It uses material from the Wikipedia.
|
|