Talk:Uniform Resource Identifier

From Wikipedia, the free encyclopedia
  (Redirected from Talk:Uniform resource identifier)
Jump to: navigation, search
WikiProject Internet (Rated GA-class)
WikiProject icon This article is within the scope of WikiProject Internet, a collaborative effort to improve the coverage of the Internet on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.
 GA  This article has been rated as GA-Class on the project's quality scale.
 ???  This article has not yet received a rating on the project's importance scale.

Misleading Venn Diagram in a different way[edit]

I spent a while being confused about the three URx's, and I think part of the problem was the Venn diagram which suggests that there's URI's that are neither URNs or URLs. Can anyone give an example of one? Or is it just that that Venn diagram is how it's always been taught (see these examples) but no one knows why?

If no one can give an example here, how about we change the image so it actually matches the article text? Only by reading the article could I overcome my mistaken first impression that there's bare URI's. Then I understood that URI is a general name for anything that's a URN or a URL (or both). --Qwerty0 (talk) 15:15, 23 March 2010 (UTC)

Technically that's not even a Venn diagram. It's an Euler diagram. Megakelvin (talk) 09:50, 19 April 2010 (UTC)

Noted. Also, I just came across this document published by the "W3C/IETF URI Planning Interest Group" describing the current de facto state of URI definitions. I'm linking directly to the section that summarizes the situation. Note that this comes directly from the top, and while it's not an official policy it's written by the people in the know (i.e. not even just some rando professor's opinion)
I'll summarize their summary. The idea that there would eventually be URI's that are neither URL's nor URN's was at best a theory put forward when the web was still taking shape. This didn't happen and the only types of URI's that came to be standards were URL's and URN's. I think all those Venn (or Euler) diagrams are from that early period when academics first wrote their chapters on URI's.
--Qwerty0 (talk) 17:50, 20 April 2010 (UTC)
URI Euler Diagram no lone URIs.svg
Ok seeing as there've been no objections I went ahead and made a candidate replacement diagram:
I'm no Euler/Venn diagram expert (nor an Inkscape ninja, as you can see) but I think this conveys what we're trying to explain, no? I'll probably wait a while again then replace the diagram if no objections.
--Qwerty0 (talk) 06:11, 22 June 2010 (UTC)

If the description of such a simple diagram takes more space than the diagram itself, the diagram seems to offer no value but only confusion. What does the color mean? Why are some lines dotted? Does the "URI" above the "URL~URN pill" offer any additional information? The diagram is neither Venn, nor Euler. If you cannot categorize URI in URL and URN then we should just remove the diagram instead of contributing to the confusion. As long as it is not clear whether URN stands for the Uniform Resource Name as abstract URI type or for the urn: namespace (also described in Uniform Resource Name), the confusion will remain. -- JakobVoss (talk) 22:20, 2 May 2011 (UTC)

URI pseudo-Venn basic.svg
Right. I only took the effort to remove the part where it implied the existence of URI's that were neither URL's nor URN's. But fixing it further shouldn't be too much of a problem. In order to remove the weird "pill" format, why not go with a more traditional Venn diagram to show the overlap of URL's and URN's? This is just a rough draft I made. I can fancy it up if we decide to use it.
I'm no Venn (or Euler) expert, so I don't know the best way to draw it to show that all URL's, URN's, and both, are in the set of URI's (but there's no URI that is neither).
But I think a visual illustration is key, especially if you notice how confusing it is to read my last sentence. It can be put much more simply in a diagram. And there is so much confusion about this, I think it's important to clearly show the correct situation.
(One last note, I'm linking here the part of the RFC that specifies the relationship. Thanks for the link, Jakob.) — Preceding unsigned comment added by Qwerty0 (talkcontribs) 21:58, 26 September 2011 (UTC)

Venn Diagram conveys the information more clearly I think Rousseaua001 (talk) 09:30, 22 August 2013 (UTC)

This is very BAD. The article is not clear about the difference/distinction (sorry, [en] is not my mother language) between URI, URL and URN. However the unsuspecting reader might think: Oh, maybe the picture might give some insight. However, the diagram gives does not contribute anything. And, more(over?), the picture is neither a(n) Euler diagram nor a Venn diagram. Paulbe (talk) 22:04, 5 February 2015 (UTC)

First, I think the Venn diagram is much clearer. However there is an inconsistency here that I am not qualified to resolve: We say that some URIs are both URLs and URNs - viz http: used in XML namespaces. However in the article on URNs it is noted that ALL URNs start with the urn: scheme. This is inconsistent with my current understanding, and I would suggest that is a URN if that is how it's being used, and is a URL if you're using it (or presenting it as suitable for use) as a resource locator. This suggests that either the definition of URN is more restricted than it should be, or there are URIs that are neither URL or URN, in which case the diagrams are incorrect. Tim.spears (talk) 18:51, 12 February 2016 (UTC)

These illustrations have been the subject of literally years of disagreement here on the talk page, so it's questionable whether the current one has any value at all. I've removed it from the article on that basis. If anyone wants to try coming up with a better one, feel free, but please propose it here first rather than just putting it in the article.  — Scott talk 12:21, 25 April 2016 (UTC)

Syntax and semantics of the file path.[edit]

The file path can contain text such as "&NR=1" and "&vq=medium". This should be mentioned with a discussion of syntax and semantics. Regards, PeterEasthope (talk) 15:26, 23 September 2011 (UTC)

2. The wiki text says that "The path must begin with a single slash (/) if an authority part was present" yet RFC3986 actually says that "When authority is present, the path must either be empty or begin with a slash ("/") character". Rickmccl (talk) 15:05, 15 June 2016 (UTC)

Relative / absolute URLs[edit]

The page says "RFC 1738 formally defined relative and absolute URLs" but there is not a single reference to the word "absolute" in , and the only mention of "relative" is in which is far from being a formal definition and in fact says "Relative links are not described in this document." — Preceding unsigned comment added by Mausch11 (talkcontribs) 21:21, 28 October 2013 (UTC)

Uppercase rename[edit]

What does your sarcastic edit summary of "Wow, really? Well, now this article's had the wrong title for four years. Good job." mean, User:Scott? That you disagree with a discussed move from 2011 in such a way that it's not worth discussing it back? --McGeddon (talk) 21:48, 14 September 2015 (UTC)

Yes. Amateurs' opinions don't override formal definitions. It's embarrassing that it was allowed to happen in the first place, let alone left unrectified for so long.  — Scott talk 21:53, 14 September 2015 (UTC)
Got it. Does Wikipedia have a defining line anywhere for when to use formal definitions and when to go with WP:COMMONNAME? --McGeddon (talk) 22:02, 14 September 2015 (UTC)
Well, COMMONNAME says Wikipedia generally prefers the name that is most commonly used (as determined by its prevalence in reliable English-language sources) as such names would usually best fit criteria such as recognizability and naturalness - and as a term of art, the most reliable sources in this case are the technical specifications and publications by the W3C, like this for instance. So I think it kind of covers that. There are scattered sources using the name without the capital letters, but not even remotely enough to justify the contention made in the very poorly-attended 2011 move discussion that The term "uniform resource identifier" is a common name not a proper name.. Which is exactly what I would have said at the time if I'd noticed it.  — Scott talk 22:17, 14 September 2015 (UTC)

Article censorship by User:Reschke[edit]

Reschke is censoring part of the article, deleting a statement and its source. This is the part censored:

RFC 3986 explicitly calls for encoding 19 characters in the ASCII character set and does not mandates percent-encoding characters outside ASCII, such as Extended ASCII or Unicode characters.[1]

Here is a portion of what RFC 3986 says in page 13 (I didn't quote it in whole):

reserved = gen-delims / sub-delims

gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@"

sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="

Fleet Command (talk) 13:30, 21 April 2016 (UTC)

You call it "censoring", I call it "removing misleading information". What's misleading is the second part that implies that non-ASCII characters do not need to be escaped; I now have rephrased the text you added (rather than removing it); please see whether you agree or disagree with what it says and then follow up over here. Reschke (talk) 14:21, 21 April 2016 (UTC)
You keep removing the citation! [1] This is extremely dishonest. As long as you keep doing that, we will not have a negotiation.
Whenever you decided to stop removing the citation, please cite a source for "Non-ASCII characters however are disallowed by definition". And if people get misled by what the article does not say, I cannot care any less. Fleet Command (talk) 14:58, 21 April 2016 (UTC)
(a) I did cite Section 1.2.1; you removed that. (b) It also follows from the ABNF productions. (c) if you "cannot care less" if people are misled, why do you actually care editing this page???? Reschke (talk) 15:53, 21 April 2016 (UTC)
@FleetCommand: RFC 3986 explicitly calls for encoding 19 characters in the ASCII character set is not a useful statement, nor does "19 characters" occur in the RFC. Please rectify this with an explicit list that can be directly cited to Section 1.2.1. Also, please don't make inflammatory accusations of "censorship" over a technical content dispute.  — Scott talk 11:42, 22 April 2016 (UTC)
@Reschke: Section 1.2.1 appears in a source whose citation you removed. I see that you have consistently removed this citation in all your edits even after FC has written it in boldface. (See above.)
@FleetCommand: There are 18 characters, not 19.
@Scott: It is section 2.2., not 1.2.1. And for the "inflammatory language", given that "censorship" is supposed to make stuff more respectable than they were, I am not sure if it qualifies as inflammatory. But I think the atmosphere here is tense with or without it.
Best regards,
Codename Lisa (talk) 19:35, 22 April 2016 (UTC)
@Codename Lisa: The whole page is about URIs as defined in RFC 3986, and it is and was already cited a lot. This additional citation did not change anything about that, and FWIW, was completely redundant here. Reschke (talk) 20:32, 22 April 2016 (UTC)
@Reschke: Hello again. The citation must appear as many times as necessary. A trillion times or more, if needs be. And in the event that an editor wants to remove a footnote, he does it properly without orphaning its citation. No sir, your conduct regarding {{sfnp|RFC 3986|2005}} is simply inexcusable. Do not remove it again.
Best regards,
Codename Lisa (talk) 09:01, 23 April 2016 (UTC)
P.S. I nearly forget: Speaking of misleading the reader, your most recent contribution (revision 716673838) is misleading because it is self-contradictory. On one hand it says "Strings of data octets within a URI are represented as printable, non-whitespace, ASCII characters" and then says "Other octets are 'reserved' [...] may need to be percent-encoded depending on the context they appear [...]". Which one is correct? The former or the latter? And where does the RFC 3986 says this? Please quote.
Codename Lisa (talk) 09:14, 23 April 2016 (UTC)
It does say so in Sections 2.2 and 2.3. Reschke (talk) 11:26, 23 April 2016 (UTC)

Discussing the URI[edit]

Here is a puzzling find. RFC 3986 says these are unreserved:
Alphabet (0x41–0x5A and 0x61–0x7A), digits (0x30–0x39), hyphen (0x2D), period (0x2E), underscore (0x5F), and tilde (0x7E)
And these are reserved:
/ ? # [ ] @ ! $ & ' ( ) * + , ; : =
So, the question is, what about these?
" \ | < > { } ^ `
Also what about control character (0x0–0x31) and space character?
Best regards,
Codename Lisa (talk) 19:58, 22 April 2016 (UTC)
The answers are in the RFC... That's why it's not helpful when people put misleading text into the Wikipedia page. The URI syntax does not allow any non-ASCII characters; this is why I removed the misleading statement in the first place. Among the ASCII characters, some are never allowed in URIs (such as control characters or the angle brackets), some need to be escaped depending from where they appear. Any oversimplification won't help here. For instance, evidently "&" does not need to be always escaped in URIs, as such the current text is just plain wrong. Reschke (talk) 20:38, 22 April 2016 (UTC)
Hello again
You'd excuse me, but the matter of the misleading text is between you and FC, seemingly resolved and would have probably never taken place if you hadn't deleted {{sfnp|RFC 3986|2005}} consistently. Let's focus on the matter at hand.
The useful discussion would be a quotation from RFC or explaining why you changed the "&" in the article to "?" but yet again proceeded to explain that "?" also does not need to always escaped. (It seems to me however that "?" does always need to be escaped.) Examples solves a lot of problems here.
Best regards,
Codename Lisa (talk) 09:22, 23 April 2016 (UTC)
I changed "&" to "?" because "&" in general does not need to be encoded, as such mentioning it as an example is misleading. "?" does need to be percent-encoded when not introducing the query component. Your recent edits are non-sensical as they claim, for instance, that "?" always needs to be percent-encoded, which is INCORRECT. I will rewrite this once again when I have time; in the meantime you may want to do a fact-check on what you wrote. Reschke (talk) 11:26, 23 April 2016 (UTC)
Quite frankly, it is as if I am talking to a wall. A wall that tries to put words in my mouth too. —Codename Lisa (talk) 11:42, 23 April 2016 (UTC)
So it seems you say that the current text implying that a "?" in a URI needs to be percent-encoded is correct? So this URI: is somehow invalid? Please clarify. Reschke (talk) 13:14, 24 April 2016 (UTC)
"Yes" and "No". The paragraph is talking about "strings of data octets", not the whole URI. The "?" in the example give appears only in the frame area. For example, see the link to How Does Your Garden Grow? article. "?" in the data octet area is precent-encoded.
Best regards,
Codename Lisa (talk) 15:47, 24 April 2016 (UTC)
I agree that "strings of data octets" isn't very clear, but I think you're wrong to assume that this somehow constrains what the following text refers to. (If that's the intent, it's indeed very obscure). WRT the example: what do you mean by "only in the frame area"??? Yes, "?" sometimes needs to be percent-encoded; sometimes not. Thus, what the text currently says is misleading as it pretends it always needs to encoded. The text that I added and that you removed tried to explain this distinction. Reschke (talk) 17:58, 24 April 2016 (UTC)
──────────────────────────────────────────────────────────────────────────────────────────────────── Why am I wrong to assume that? —Codename Lisa (talk) 18:07, 24 April 2016 (UTC)
ok, so what do you believe "strings of data octets" refers to? Can you rephrase it in a way so it becomes clearer? Reschke (talk) 19:42, 24 April 2016 (UTC)
In the example below, I have highlighted the non-data area in red. Outside red, the "?" cannot appear. (Outside red, it must be percent-encoded.)
Best regards,
Codename Lisa (talk) 14:13, 25 April 2016 (UTC)
Thanks for trying to explain. I'm afraid however that this is incorrect on at least two counts: a) RFC 3986 does *not* define the structure of a query, so "&" and "=" have no special meaning at all (query string format is an artefact of how HTML form submission works; from the URI RFC's point of view, this is just a convention), b) "?" in fact *is* allowed to appear verbatim in the query (and in the fragment). See Sections 3.4 and 3.5 of the RFC. Reschke (talk) 16:33, 25 April 2016 (UTC)
Granted. However, I badly need an example here. Best regards, Codename Lisa (talk) 17:03, 26 April 2016 (UTC) Reschke (talk) 11:00, 28 April 2016 (UTC)
I see. Glad to see it clarified.
Right now I am looking at the web URL of the Persian language of this article. It reads:یوآرآی Is this not a valid URI? —Codename Lisa (talk) 11:16, 28 April 2016 (UTC)
No, it's not. But it is a Reschke (talk) 12:04, 28 April 2016 (UTC)
Exactly; as a URI it would be .  — Scott talk 15:01, 29 April 2016 (UTC)
──────────────────────────────────────────────────────────────────────────────────────────────────── Is this your own interpretation or did you get it from somewhere? If the answer is the latter, where? —Codename Lisa (talk) 16:05, 29 April 2016 (UTC)

The RFC is simple and unambiguous. If you are unclear, study Section 1.3 ("Syntax Notation") of the RFC, then Appendix A ("Collected ABNF for URI"). Then Section 2.2 ("Reserved Characters"), which states If data for a URI component would conflict with a reserved character's purpose as a delimiter, then the conflicting data must be percent-encoded before the URI is formed. The purpose of reserved characters is to provide a set of delimiting characters that are distinguishable from other data within a URI. The article now states this plainly. I've just nominated this as a Good Article, so please stop arguing over your interpretations of something that simply needs to be quoted from the specification, like most of the rest of the article.  — Scott talk 12:10, 25 April 2016 (UTC)

I do agree that the RFC is simple and unambiguous. I which so was the article. It still says "Permitted characters within a URI are the ASCII characters for the lowercase and uppercase letters of the modern English alphabet, the Arabic numerals, hyphen, period, underscore, and tilde.[4] Octets represented by any other character must be percent-encoded." -- However, that is clearly not true, because, for instance, "?" can appear verbatim; not only as delimiter starting the "query" component, but also afterwards (inside the query, see -- as such, and in my humble opinion, the paragraph continues to be misleading. Reschke (talk) 16:28, 25 April 2016 (UTC)
Scott, you added "When not intended as delimiters, they must be percent-encoded, for example %26 for an ampersand (&)." -- again, incorrect. For instance, "&" can be used unencoded in the path component.Reschke (talk) 16:37, 25 April 2016 (UTC)
I had missed both those cases - thanks. Fixed.  — Scott talk 16:44, 25 April 2016 (UTC)
There are more of these; pchar allows all sub-delims, ":" and "@". (there was a reason why I said the text is incorrect and misleading) Reschke (talk) 16:51, 25 April 2016 (UTC)
My omissions. You do know that you could spend the time fixing them in the article rather than complaining here? Speaking of which, looking at the RFC again I can no longer see anything to support your "&" can be used unencoded in the path component. I'm not sure if I even did see it anywhere but took your comment on face value. "&" occurs only in the list of reserved characters in §2.2. So I've taken that out again. If there's something you can point to that illustrates I'm misreading, please do so.  — Scott talk 19:42, 25 April 2016 (UTC)
Ahem, I *did* try to edit the text, do you remember? Apparently it took some time for you to realize that the text indeed was borked. Wrt your question, the RFC defines: pchar = unreserved / pct-encoded / sub-delims / ":" / "@" and sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "="... Reschke (talk) 20:20, 25 April 2016 (UTC)
You were involved in an edit war with someone else. I started editing the content in question (as opposed to its formatting) after Codename Lisa; you didn't touch the article after that. Anyway, that's unimportant. Thanks for clarifying. I've modified the paragraph again. Satisfactory?  — Scott talk 22:13, 25 April 2016 (UTC)

GA Review[edit]

This review is transcluded from Talk:Uniform Resource Identifier/GA1. The edit link for this section can be used to add comments to the review.

Reviewer: GeoffreyT2000 (talk · contribs) 19:44, 16 August 2016 (UTC)

There are no maintenance tags in this article, which is great; and the lead does not have references per WP:LEADCITE. References and footnotes are used in the article, with each numbered footnote consistently using Harvard parenthetical referencing with the year in parentheses. However, the "URI references" section does not have any references in it, and they need to be added like in other sections. GeoffreyT2000 (talk) 19:44, 16 August 2016 (UTC)

@Scott: Can you please take the time to find and add those sources to the "URI references" section? GeoffreyT2000 (talk) 04:03, 22 August 2016 (UTC)
@GeoffreyT2000: Got it, will see what I can do.  — Scott talk 09:43, 24 August 2016 (UTC)
@GeoffreyT2000: How about now?  — Scott talk 09:37, 5 September 2016 (UTC)

Article scope[edit]

Is this article about Uniform Resource Identifiers generally, or only some particular type(s) of URI implementation(s) that is(are) in use today?

I'm cool with it either way, but we probably ought to make the distinction or narrow limitation clear in the article prose. I had added some information on a blockchain-based use of URIs, and it was removed by another editor, so starting this discussion here.

I'm new to URIs, so when I found out about a URI not explicated anywhere in the great encyclopedia of human knowledge, I added that URI to this article. If it doesn't fit here, I would like to better understand what is the specific scope that is thought/felt by other editors to be included within the particular subset of all URIs that is described in this particular article. Cheers. N2e (talk) 23:40, 4 November 2016 (UTC)

I had thought there would be some comments in here by now. My own thought on the question is that, in general, a Wikipedia article on the topic of some descriptive noun, say foobar, should cover all verifiably sourced uses of foobar, and not just some particular subset of foobar that might have been explicated in the encyclopedia in earlier days. There are exceptions, of course, as when an article is too large, so multiple articles and disambiguation occurs, etc.
But I'm not seeing anyone make an argument in here to such an effect. So if others don't make such discussion here, then in another week or so, I will just assume that the scope on this article is all uses of Uniform Resource Identifiers, and not any particular subset of all, or any one particular URI standard. N2e (talk) 14:44, 8 November 2016 (UTC)
The material you added to this article was inappropriately located in the section that discusses the development of the standards that define URIs themselves, which is why I took it out.
URIs are a very widely-used technology, and the scope of this article is absolutely not "all uses" of them - that's indiscriminate. To put something in here, it needs to be germane to the topic of URIs on the whole, or particularly notable in the field - non-trivial coverage in reliable sources. Your text about Ethereum was referenced to a slideshow and some YouTube videos by the Ethereum people themselves, which aren't reliable secondary sources.
Also, regarding "I had thought there would be some comments in here by now" - you wrote your comment 4 days ago. This isn't a high-traffic talk page, you need to be more patient.  — Scott talk 18:33, 8 November 2016 (UTC)
This Talk page section is not a discussion about any particular edit, it is about the scope for the article, and about what criteria might be used to determine if particular URIs that are otherwise verifiable etc. might not appropriately fit here. N2e (talk) 11:18, 10 November 2016 (UTC)