Talk:Internationalized resource identifier

From Wikipedia, the free encyclopedia
Jump to: navigation, search
WikiProject Computing (Rated Stub-class)
WikiProject icon This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
Stub-Class article Stub  This article has been rated as Stub-Class on the project's quality scale.
 ???  This article has not yet received a rating on the project's importance scale.
 
Note icon
This article has been automatically rated by a bot or other tool as Stub-Class because it uses a stub template. Please ensure the assessment is correct before removing the |auto= parameter.

This page needs serious attention.

I seem to recall that Tim Berners-Lee made a key distinction between Locator and Identifier (in his Weaving the Web). In my opinion an identifier is not "a kind of locator"!

Also the advantages are described from the wrong cultural viewpoint. —Preceding unsigned comment added by MihalOrel (talkcontribs) 09:05, 30 June 2008 (UTC)


Wikipedia itself offers a good illustration of the current problem. Look at

http://bg.wikipedia.org/wiki/Начална_страница

This is the URL for the start page of Wikipedia in Bulgarian. An IRI for the same page might look something like

хттп://бг.уикипедия.oрг/уики/Начална_страница

In other words from the point of view of the other, in this case the Bulgarian, an IRI is just something in Bulgarian when it concerns Bulgarian point of view of the world.

I guess I should try to find time to modify the text myself. (Михал Орела 09:36, 30 June 2008 (UTC))

ICANN[edit]

Supporting news from The Sofia Weekly 28 June 2008:

Bulgaria Tables Request to Register Internet Domain in Cyrillic

Bulgaria became Monday the first nation to request the registration of an Internet domain in Cyrillic.

Bulgaria's representative at the Governmental Advisory Committee of the Internet Corporation for Assigned Names and Numbers has delivered a letter on behalf of the Chair of the State Agency for Information Technologies and Communications Plamen Vachkov to the ICANN President Paul Towmey in Paris requesting the right to register a domain in Cyrillic.

In submitting their letter, the Bulgarian authorities took advantage of the fact that the delegates at the ICANN Conference currently taking place in Paris are expected to make a decision for the setting up of multi-lingual first level domains.

ICANN manages the domains .com, .net, .info, and .org among others. Bulgaria is requesting to register and maintain domain .бг, which is likely the country's present code .bg but in Cyrillic.

The move is actively supported by the Bulgarian Uninet Association, which is working to promote the use of the Cyrillic alphabet on the net.

And a little further research on the ICANN web site http://www.icann.org/en/announcements/announcement-05jun08-en.htm shows how one might achieve progress. Examples are given at http://idn.icann.org/ (Михал Орела 09:55, 30 June 2008 (UTC))

W3C Internationalization[edit]

Another very good account of the major issues are available at the W3C site http://www.w3.org/International/articles/idn-and-iri/

In fact I am of the opinion that it proves to give good cultural arguments for the need to switch to IRIs now and some very cogent technical reasons why we must. (Михал Орела 14:04, 30 June 2008 (UTC))

Proposed New Text (ericP)[edit]

Not to be confused with Uniform resource identifier (URI).
Not to be confused with Uniform resource locator (URL).

In computing, an Internationalized Resource Identifier (IRI) is a string of Unicode characters used to identify a name or a resource. IRIs provide a multi-language, multi-script alternative to URIs. IRIs are defined by RFC 3987.

Relationship to URI[edit]

While URIs are limited to a subset of the ASCII character set, IRIs may contain characters from the Universal Character Set (Unicode/ISO 10646), including Chinese or Japanese kanji, Korean, Cyrillic characters, and so forth. Many internet protocols such as HTTP and DNS use URIs or portions of them but publication languages such as AtomPub and RDF use IRIs to identify web resoruces. RFC3987 defines a mapping from IRIs to URIs, allowing, for example, IRIs to be dereferenced on the [World Wide Web]. The IRI

 http://рнидс.срб/cir/документи

maps to the URI

 http://xn--d1aholi.xn--90a3ac/cir/%D0%B4%D0%BE%D0%BA%D1%83%D0%BC%D0%B5%D0%BD%D1%82%D0%B8

by

The location bar in most conventional browsers is a compromise between URL and IRI. For instance, the Firefox 10 location bar will accept an IRI like http://рнидс.срб/cir/документи, but display it with the punycode domain name http://xn--d1aholi.xn--90a3ac/cir/документи.

Advantages[edit]

There are reasons to see URIs displayed in different languages; mostly, it makes it easier for users who are unfamiliar with the Latin (A-Z) alphabet. Assuming that it isn't too difficult for anyone to replicate arbitrary Unicode on their keyboards, this can make the URI system more worldly and accessible.

Disadvantages[edit]

Mixing IRIs and ASCII URIs can make it much easier to do phishing attacks that trick someone into believing they are on a site they really are not on. For example, one can replace the "a" in www.ebay.com or www.paypal.com with an internationalized look-alike "a" character, and point that IRI to a malicious site. This is known as an IDN homograph attack.

While a URI does not provide people with a way to specify Web resources using their own alphabets, an IRI does not make clear how Web resources can be accessed with keyboards that are not capable of generating the requisite internationalized characters.

See also[edit]

  • XRI (Extensible Resource Identifier)
  • IDN (Internationalized Domain Name)
  • Punycode

External links[edit]

[[Category:Internet Standards]] {{compu-stub}}