Jump to content

8.3 filename

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by 210.193.53.1 (talk) at 03:33, 31 October 2007 (Overview). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

An 8.3 filename[1] (also called a short filename or SFN) is a filename convention used by old versions of DOS and versions of Microsoft Windows prior to Windows 95 and Windows NT 3.5. It is also used in modern Microsoft operating systems as an alternate filename to the long filename for compatibility with legacy programs. The filename convention is limited by the FAT file system. Similar 8.3 file naming schemes have also existed on earlier CP/M and on some Data General and Digital Equipment Corporation minicomputer operating systems.

Overview

8.3 filenames have at most eight characters, optionally followed by a "." and a filename extension of at most three characters. For files with no extension, the "." if present has no significance (that is "myfile" and "myfile." are equivalent). File and directory names are uppercase, although systems that use the 8.3 standard are usually case-insensitive.

VFAT, a variant of FAT with an extended directory format, was introduced in Windows 95 and Windows NT 3.5. It allowed mixed-case Unicode long filenames (LFNs) in addition to classic 8.3 names.

To maintain backward-compatibility with legacy applications (on DOS and Windows 3.1), an 8.3 filename is automatically generated for every LFN, through which the file can still be renamed, deleted or opened.

Although there is no compulsory algorithm for creating the 8.3 name from an LFN, Windows uses the following convention:

  1. If the LFN is 8.3 uppercase or lowercase, no LFN will be stored on disk at all.
    • Example: "TEXTFILE.TXT"
  2. If the LFN is 8.3 mixed case, the LFN will store the mixed-case name, while the 8.3 name will be an uppercased version of it.
    • Example: "TextFile.Txt" becomes "TEXTFILE.TXT".
  3. If the filename contains characters not allowed in an 8.3 name (including space which was disallowed by convention though not by the APIs) or either part is too long, the name is stripped of invalid characters such as spaces and extra periods. Other characters such as (+) are changed to the underscore (_), and uppercased. The stripped name is then truncated to the first 6 letters of its basename, followed by a tilde, followed by a single digit, followed by the first 3 characters of the extension.
    • Example: "TextFile1.Mine.txt" becomes "TEXTFI~1.TXT" (or "TEXTFI~2.TXT", should "TEXTFI~1.TXT" already exist). "ver +1.2.text" becomes "VER_12~1.TEX".
  4. Beginning with Windows 2000, if at least 4 files or folders already exist with the same initial 6 characters in their short names, the stripped LFN is instead truncated to the first 2 letters of the basename (or 1 if the basename has only 1 letter), followed by 4 hexadecimal digits derived from an undocumented hash of the filename, followed by a tilde, followed by a single digit, followed by the first 3 characters of the extension.
    • Example: "TextFile.Mine.txt" becomes "TE021F~1.TXT".

The NTFS file system used by the Windows NT family supports LFNs natively, but 8.3 names are still available for legacy applications. This can be optionally disabled to increase performance.

The ISO 9660 file system (mainly used on compact discs) has similar limitations at the most basic Level 1, with the additional restriction that directory names cannot contain extensions and that some characters (notably hyphens) are not allowed in filenames. Level 2 allows filenames of up to 31 characters, more compatible with Mac OS filenames.

During the Microsoft antitrust trials, the names MICROS~1 and MICROS~2 were humorously used to refer to the companies that might exist after a proposed split of Microsoft.

Compatibility

This legacy technology is used in a wide range of products and devices, as a standard for interchanging information, such as compact flash cards used in cameras. VFAT LFN Long filenames introduced by Windows 95/98/ME retained compatibility. But the VFAT LFN used on NT-based systems (Windows NT/2K/XP) uses a modified 8.3 shortname.

If a filename contains only lowercase letters, or is a combination of a lowercase basename with an uppercase extension, or vice-versa; and has no special characters, and fits within the 8.3 limits, a VFAT entry is not created on Windows NT and later versions such as XP. Instead, two bits in byte 0x0c of the directory entry are used to indicate that the filename should be considered as entirely or partially lowercase. Specifically, bit 4 means lowercase extension and bit 3 lowercase basename, which allows for combinations such as "example.TXT" or "HELLO.txt" but not "Mixed.txt". Few other operating systems support this. This creates a backwards-compatibility problem with older Windows versions (95, 98, ME) that see all-uppercase filenames if this extension has been used, and therefore can change the name of a file when it is transported, such as on a USB flash drive. Current 2.6.x versions of Linux will recognize this extension when reading (source: kernel 2.6.18 /fs/fat/dir.c and fs/vfat/namei.c); the mount option shortname determines whether this feature is used when writing.[1]

Directory table

A directory table is a special type of file that represents a directory (nowadays commonly known as a folder). Each file or directory stored within it is represented by a 32-byte entry in the table. Each entry records the name, extension, attributes (archive, directory, hidden, read-only, system and volume), the date and time of creation, the address of the first cluster of the file/directory's data and finally the size of the file/directory.

Legal characters for DOS filenames include the following:

  • Upper case letters AZ
  • Numbers 09
  • Space (though trailing spaces in either the base name or the extension are considered to be padding and not a part of the filename, also filenames with spaces in them could not be used on the DOS command line because it lacked a suitable escaping system)
  • ! # $ % & ' ( ) - @ ^ _ ` { } ~
  • (FAT-32 only) + , . ; = [ ]
  • Values 128–255

This excludes the following ASCII characters:

  • " * / : < > ? \ |
    Windows/MSDOS has no shell escape character
  • Lower case letters az
    stored as AZ on FAT-12/16
  • Control characters 0–31
  • Value 127 (DEL), which makes troubles when Cyrillic KOI-8 encoding is used, because it corresponds to Cyrillic capital letter "Е". Some operating systems such as ANDOS used to automatically change the letter to the similar-looking Latin one.

The DOS filenames are in the OEM character set.

Directory entries, both in the Root Directory Region and in subdirectories, are of the following format:

Byte Offset Length Description
0x00 8 DOS filename (padded with spaces)

The first byte can have the following special values:

0x00 Entry is available and no subsequent entry is in use
0x05 Initial character is actually 0xE5
0x2E 'Dot' entry; either '.' or '..'
0xE5 Entry has been previously erased and is not available. File undelete utilities must replace this character with a regular character as part of the undeletion process.
0x08 3 DOS file extension (padded with spaces)
0x0b 1 File Attributes

The first byte can have the following special values:

Bit Mask Description
0 0x01 Read Only
1 0x02 Hidden
2 0x04 System
3 0x08 Volume Label
4 0x10 Subdirectory
5 0x20 Archive
6 0x40 Device (internal use only, never found on disk)
7 0x80 Unused

An attribute value of 0x0F is used to designate a long filename entry.

0x0c 1 Reserved; two bits are used by NT and later versions to encode case information
0x0d 1 Create time, fine resolution: 10ms units, values from 0 to 199.
0x0e 2 Create time. The hour, minute and second are encoded according to the following bitmap:
Bits Description
15-11 Hours (0-23)
10-5 Minutes (0-59)
4-0 Seconds/2 (0-29)

Note that the seconds is recorded only to a 2 second resolution. Finer resolution for file creation is found at offset 0x0d.

0x10 2 Create date. The year, month and day are encoded according to the following bitmap:
Bits Description
15-9 Year (0 = 1980, 127 = 2107)
8-5 Month (1 = January, 12 = December)
4-0 Day (1 - 31)
0x12 2 Last access date; see offset 0x10 for description.
0x14 2 EA-Index (used by OS/2 and NT) in FAT12 and FAT16, High 2 bytes of first cluster number in FAT32
0x16 2 Last modified time; see offset 0x0e for description.
0x18 2 Last modified date; see offset 0x10 for description.
0x1a 2 First cluster in FAT12 and FAT16. Low 2 bytes of first cluster in FAT32.
0x1c 4 File size

See also

References

  1. ^ "Naming a File". Microsoft Developer Network.