Jump to content

File URI scheme: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
TJollans (talk | contribs)
Format: Removed incorrect additional slash.
Line 5: Line 5:
== Format ==
== Format ==
A file URI takes the form of
A file URI takes the form of
'''file:///''host''/''path'''''
'''file://''host''/''path'''''
where '''''host''''' is the [[fully qualified domain name]] of the system on which the ''path'' is accessible, and '''''path''''' is a hierarchical directory path of the form ''directory''/''directory''/.../''name''. If ''host'' is omitted, it is taken to be "[[localhost]]", the machine from which the URL is being interpreted. Note that when omitting host, the slash is not omitted (while "file:///foo.txt" is valid, "file://foo.txt" is not, although some interpreters manage to handle the latter).
where '''''host''''' is the [[fully qualified domain name]] of the system on which the ''path'' is accessible, and '''''path''''' is a hierarchical directory path of the form ''directory''/''directory''/.../''name''. If ''host'' is omitted, it is taken to be "[[localhost]]", the machine from which the URL is being interpreted. Note that when omitting host, the slash is not omitted (while "file:///foo.txt" is valid, "file://foo.txt" is not, although some interpreters manage to handle the latter).



Revision as of 11:42, 31 March 2016

The file URI scheme is a URI scheme specified in RFC 1630 and RFC 1738, typically used to retrieve files from within one's own computer. The Internet Engineering Task Force (IETF) has published a series of draft documents obsoleting these RFCs. They say that they are trying to define "a syntax that is compatible with most extant implementations, while attempting to push towards a stricter subset of 'ideal' constructs." Doing so involves the deprecation of some less common or outdated constructs, some of which are described below. While they may work on some current systems, formulations that are not consistent with the standardization process going forward will not have the useful lifetime that others will. The drafts are not final, and should be consulted for up to date information.[1]

Format

A file URI takes the form of

file://host/path

where host is the fully qualified domain name of the system on which the path is accessible, and path is a hierarchical directory path of the form directory/directory/.../name. If host is omitted, it is taken to be "localhost", the machine from which the URL is being interpreted. Note that when omitting host, the slash is not omitted (while "file:///foo.txt" is valid, "file://foo.txt" is not, although some interpreters manage to handle the latter).

[RFC 3986] includes additional information about the treatment of ".." and "." segments in URIs.

Meaning of slash character

The slash character (/), depending on its position, has different meanings within a file URL.

  • The // after the file: is part of the general syntax of URLs. (The double slash // should always appear in a file URL according to the specification, but in practice many Web browsers allow it to be omitted).
  • The single slash between host and path is part of the syntax of URLs.
  • The slashes in path separate directory names in a hierarchical system of directories and subdirectories. In this usage, the slash is a general, system-independent way of separating the parts, and in a particular host system it might be used as such in any pathname (as in Unix systems).

Examples

Unix

Here are two Unix examples pointing to the same /etc/fstab file:

file://localhost/etc/fstab
file:///etc/fstab

Windows

Here are some examples which may be accepted by some applications on Windows systems, referring to the same, local file c:\WINDOWS\clock.avi

file://localhost/c|/WINDOWS/clock.avi
file:///c|/WINDOWS/clock.avi
file://localhost/c:/WINDOWS/clock.avi

Here is the URI as understood by the Windows Shell API:[2]

file:///c:/WINDOWS/clock.avi

Implementations

Windows

On Microsoft Windows systems, the normal colon (:) after a device letter has sometimes been replaced by a vertical bar (|) in file URLs. This reflected the original URL syntax, which made the colon a reserved character in a path part.

Since Internet Explorer 4, file URIs have been standardized on Windows, and should follow the following scheme. This applies to all applications which use URLMON or SHLWAPI for parsing, fetching or binding to URIs. To convert a path to a URL, use UrlCreateFromPath, and to convert a URL to a path, use PathCreateFromUrl.[3]

To access a file "the file.txt", the following might be used.

For a network location:

file://hostname/path/to/the%20file.txt

Or for a local file, the hostname is omitted, but the slash is not (note the third slash):

file:///c:/path/to/the%20file.txt

This is not the same as providing the string "localhost" or the dot "." in place of the hostname. The string "localhost" will attempt to access the file as \\localhost\c:\path\to\the file.txt, which will not work since the colon is not allowed in a share name. The dot "." results in the string being passed as \\.\c:\path\to\the file.txt, which will work for local files, but not shares on the local system. For example file://./sharename/path/to/the%20file.txt will not work, because it will result in sharename being interpreted as part of the DOSDEVICES namespace, not as a network share.

The following outline roughly describes the requirements.

  • The colon should be used, and should not be replaced with a vertical bar for Internet Explorer.
  • Forward slashes should be used to delimit paths.
  • Characters such as the hash (#) or question mark (?) which are part of the filename should be percent-encoded.
  • Characters which are not allowed in URIs, but which are allowed in filenames, must also be percent-encoded. For example, any of "{}`^ " and all control characters. In the example above, the space in the filename is encoded as %20.
  • Characters which are allowed in both URIs and filenames must NOT be percent-encoded.
  • Must not use legacy ACP encodings. (ACP code pages are specified by DOS CHCP or Windows Control Panel language setting.)
  • Unicode characters outside of the ASCII range must be UTF-8 encoded, and those UTF-8 encodings must be percent-encoded.

Use the provided functions if possible. If must create a URL programmatically and cannot access SHLWAPI.dll (for example from script, or another programming environment where the equivalent functions are not available) the above outline will help.

Legacy URLs

To aid the installed base of legacy applications, the PathCreateFromUrl recognizes certain URLs which do not meet these criteria, and treats them uniformly. These are called "legacy" file URLs as opposed to "healthy" file URLs.[4]

In the past, a variety of other applications have used other systems. Some added an additional two slashes. For example, \\remotehost\share\dir\file.txt, would become file:////remotehost/share/dir/file.txt instead of the "healthy" file://remotehost/share/dir/file.txt.

Web pages

File URLs are rarely used in Web pages on the public Internet, since they imply that a file exists on the designated host. The host specifier can be used to retrieve a file from an external source, although no specific file-retrieval protocol is specified; and using it should result in a message that informs the user that no mechanism to access that machine is available.

References

  1. ^ "The file URI Scheme: draft-ietf-appsawg-file-scheme-03". Internet Engineering Task Force (IETF). 23 July 2015. Retrieved 21 Aug 2015.
  2. ^ Risney, Dave (2006). "File URIs in Windows". IEBlog. Microsoft Corporation. Retrieved 31 July 2013.
  3. ^ File URIs in Windows - IEBlog - Site Home - MSDN Blogs. Blogs.msdn.com (2006-12-06). Retrieved on 2014-03-08.
  4. ^ The Bizarre and Unhappy Story of 'file:' URLs - Free Associations - Site Home - MSDN Blogs. Blogs.msdn.com (2005-05-19). Retrieved on 2014-03-08.