Clean URL: Difference between revisions

Content deleted Content added

Inline

Revision as of 21:45, 12 May 2014

Clean URLs, RESTful URLs, user-friendly URLs or SEO-friendly URLs are purely structural URLs that do not contain a query string [e.g., action=delete&id=91] and instead contain only the path of the resource (after the scheme [e.g., http] and the authority [e.g., example.org]). This is often done for aesthetic, usability, or search engine optimization (SEO) purposes.^[1] Other reasons for designing a clean URL structure for a website or web service include ensuring that individual web resources remain under the same URL for years, which makes the World Wide Web a more stable and useful system,^[2] and to make them memorable, logical, easy to type, human-centric, and long-lived.^[3]

Examples

Examples of "unclean" versus "clean" URLs follow: class="wikitable "

Reasoning and common practices

The most often cited reasons for using clean URLs is for search engine optimization, but clean URLs can also greatly improve usability and accessibility. Removing unnecessary parts simplifies URLs and makes them easier to type and remember.

The general format of an unclean URL involves a query string with implementation details, ids, illegible encodings, long names, etc.:

http://example.com/services/index.jsp?category=2&id=medical%20patents

A clean URL should have all components legible, and in terms of the URI scheme have no query string, but only a hierarchical part, similar to a path with filename. The hierarchical components should reflect a logical structure, while the last component, called the slug, is analogous to the basename in a filename:

http://example.com/services/legal/medical-patents

A fragment identifier can be included at the end, for references within a page, and need not be user-readable.^[4] Readability is subjective.

"Clean" is also subjective. There can also be different levels of cleanliness. Web developers usually recommend for usability and search engine optimization purposes to make URLs descriptive; so when planning the structure of clean URLs, webmasters often take this opportunity to include relevant keywords in the URL and remove irrelevant words from it. So common words like "the", "and", "an", "a", etc. are often stripped out to further trim down the URL while descriptive keywords are added to increase user-friendliness and improve search engine ranking.^[1] This includes replacing hard-to-remember numerical IDs with the name of the resource it refers to. Similarly, it is common practice to replace cryptic variable names and parameters with friendly names or to simply do away with them altogether. Shorter URLs that do not contain any abbreviations or complex syntax that is unknown to the average user are less intimidating and contribute to overall usability.

Slug

A slug is the part of a URL which identifies a page using human-readable keywords.^[5]^[6] It is usually the end part of the URL, which can be interpreted as the name of the resource, similar to the basename in a filename or the title of a page. The name is based on the use of the word slug in the news media to indicate a short name given to an article for internal use. For example, in

http://www.example.com/services/legal/medical-patents

the slug is medical-patents. This can be generated automatically from a page title or specified manually.

If generated automatically, characters in the original title may be substituted to avoid percent-encoding due to restrictions on web URLs, and common words may be omitted to minimize the final length of the slug. It is common practice to make the slug all lowercase, accented characters are usually replaced by letters from the English alphabet, punctuation marks are generally removed, and long page titles may also be truncated to keep the final URL to a reasonable length. For example, "Nuts & Raisins in the News!" URL-encodes to Nuts%20%26%20Raisins%20in%20the%20News%21, but could have been simplified automatically to nuts-raisins-news. Automatically creating a slug from a title ("slugging") can be seen as a form of munging or wrangling. It often involves the use of regular expression substitutions. The whitespace separator may be replaced by a dash character or an underscore, as in snake_case or spinal-case.

Instead of automatic slugging, a slug can also be entered or altered manually so that while the page title remains designed for display and human readability, its slug may be optimized for brevity or for consumption by search engines.

Use of the "rel-tag" microformat (for tagging) requires a clean slug.^{[clarification needed]}

Implementation-independent

Another aspect of clean URLs is that they do not contain implementation details of the underlying web application. For example, many URLs include the filename of a server-side script, such as "example.php", "example.asp" or "cgi-bin". Such details are irrelevant to the user and do not serve to identify the content, and make it harder to change the implementation of the server at a later date. For example, if a script "example.php" is rewritten in Python, URLs that include the name of the script have to change, but clean URLs that leave out such cruft stay the same -- Cool URIs don't change. Typically clean URLs use rewrite rules to select which script (if any) to run, rather than putting the name of the script in the URL.

References

^ ^a ^b Opitz, Pascal (28 February 2006). "Clean URLs for better search engine ranking". Content with Style. Retrieved 9 September 2010.
^ Berners-Lee, Tim (1998). "Cool URIs don't change". Style Guide for online hypertext. W3C. Retrieved 6 March 2011.
^ Neath, Kyle (2010). "URL Design". Retrieved 6 March 2011.
^ "Uniform Resource Identifier (URI): Generic Syntax". RFC 3986. Internet Engineering Task Force. Retrieved 2 May 2014.
^ Slug in the WordPress glossary
^ Slug in the Django glossary

External links

[cws-1] Opitz, Pascal (28 February 2006). "Clean URLs for better search engine ranking". Content with Style. Retrieved 9 September 2010.

[2] Berners-Lee, Tim (1998). "Cool URIs don't change". Style Guide for online hypertext. W3C. Retrieved 6 March 2011.

[3] Neath, Kyle (2010). "URL Design". Retrieved 6 March 2011.

[4] "Uniform Resource Identifier (URI): Generic Syntax". RFC 3986. Internet Engineering Task Force. Retrieved 2 May 2014.

[5] Slug in the WordPress glossary

[6] Slug in the Django glossary

[1]

[2]

[3]

[4]

[5]

[6]

@@ Line 38: / Line 38: @@
 A [[fragment identifier]] can be included at the end, for references within a page, and need not be user-readable.<ref>{{cite web|title=Uniform Resource Identifier (URI): Generic Syntax|url=http://tools.ietf.org/html/rfc3986|work=RFC 3986|publisher=Internet Engineering Task Force|accessdate=2 May 2014}}</ref>  Readability is subjective.
-"Clean" is also subjective.  There can also be different levels of cleanliness. Web developers usually recommend for usability and search engine optimization purposes to make URLs descriptive; so when planning the structure of clean URLs, webmasters often take this opportunity to include relevant keywords in the URL and remove irrelevant wods from it. So common words like "the", "and", "an", "a", etc. are often stripped out to further trim down the URL while descriptive keywords are added to increase user-friendliness and improve search engine ranking.<ref name="cws" /> This includes replacing hard-to-remember [[surrogate key|numerical IDs]] with the name of the resource it refers to. Similarly, it is common practice to replace cryptic variable names and parameters with friendly names or to simply do away with them altogether. Shorter URLs that do not contain any abbreviations or complex syntax that is unknown to the average user are less intimidating and contribute to overall usability.
+"Clean" is also subjective.  There can also be different levels of cleanliness. Web developers usually recommend for usability and search engine optimization purposes to make URLs descriptive; so when planning the structure of clean URLs, webmasters often take this opportunity to include relevant keywords in the URL and remove irrelevant words from it. So common words like "the", "and", "an", "a", etc. are often stripped out to further trim down the URL while descriptive keywords are added to increase user-friendliness and improve search engine ranking.<ref name="cws" /> This includes replacing hard-to-remember [[surrogate key|numerical IDs]] with the name of the resource it refers to. Similarly, it is common practice to replace cryptic variable names and parameters with friendly names or to simply do away with them altogether. Shorter URLs that do not contain any abbreviations or complex syntax that is unknown to the average user are less intimidating and contribute to overall usability.
 === Slug ===
@@ Line 45: / Line 45: @@
 the slug is <tt>medical-patents</tt>. This can be generated automatically from a page title or specified manually.
-If generated automatically, characters in the original title may be substituted to avoid [[percent-encoding]] due to restrictions on [[HTTP|web]] URLs, and common words may be omitted to minimise the final length of the slug. It is common practice to make the slug all lowercase, [[accented characters]] are usually replaced by letters from the [[English alphabet]], punctuation marks are generally removed, and long page titles may also be truncated to keep the final URL to a reasonable length. For example, "Nuts &amp; Raisins in the News!" URL-encodes to <tt>Nuts%20%26%20Raisins%20in%20the%20News%21</tt>, but could have been simplified automatically to <tt>nuts-raisins-news</tt>. Automatically creating a slug from a title ("slugging") can be seen as a form of [[Munge (computer term)|munging]] or [[Data wrangling|wrangling]]. It often involves the use of [[regular expression]] substitutions. The [[whitespace]] separator may be replaced by a [[dash character]] or an [[underscore]], as in [[snake_case]] or [[spinal-case]].
+If generated automatically, characters in the original title may be substituted to avoid [[percent-encoding]] due to restrictions on [[HTTP|web]] URLs, and common words may be omitted to minimize the final length of the slug. It is common practice to make the slug all lowercase, [[accented characters]] are usually replaced by letters from the [[English alphabet]], punctuation marks are generally removed, and long page titles may also be truncated to keep the final URL to a reasonable length. For example, "Nuts &amp; Raisins in the News!" URL-encodes to <tt>Nuts%20%26%20Raisins%20in%20the%20News%21</tt>, but could have been simplified automatically to <tt>nuts-raisins-news</tt>. Automatically creating a slug from a title ("slugging") can be seen as a form of [[Munge (computer term)|munging]] or [[Data wrangling|wrangling]]. It often involves the use of [[regular expression]] substitutions. The [[whitespace]] separator may be replaced by a [[dash character]] or an [[underscore]], as in [[snake_case]] or [[spinal-case]].
 Instead of automatic slugging, a slug can also be entered or altered manually so that while the page title remains designed for display and human readability, its slug may be [[Search engine optimization|optimized]] for brevity or for consumption by [[search engine]]s.