File Allocation Table: Difference between revisions

Content deleted Content added

Inline

Revision as of 20:21, 12 September 2004

File Allocation Table (FAT) is the main file system developed for MS-DOS and Windows. The FAT file system is considered relatively uncomplicated, and because of that, it is a popular format for floppy disks; moreover, it is supported by virtually all existing operating systems for personal computers, and because of that it is often used to share data between several operating systems booting on the same computer (a multiboot environment). It is also used on solid-state memory sticks and other similar devices.

FAT is a relatively early file system design that was designed in a hurry to replace the CP/M file system - it was in fact the main innovation in the earlies versions of MS-DOS compared to CP/M, otherwise DOS was almost a clone of CP/M. FAT's designers did not even exploit what was already known about file system design, and because of that, it suffered from several problems. First, its simple file layout allows for easy fragmentation, which leads to significant slowdowns during file system operations. Secondly, FAT was not designed for redundancy in the case of system failure. Thirdly, the first versions of FAT allowed file name sizes of only up to 11 characters (file name of 8 and a file extension of 3), although a work-around was developed when Microsoft implemented VFAT, that would allow file names of up to 255 characters. Finally, the large cluster size in many versions of FAT meant that a lot of space was being wasted on padding small files to the cluster size.

However, because IBM designated MS-DOS as its primary operating system for its PC, and MS-DOS used FAT, FAT became widely used and is still used in a number of circumstances. Because of the primitive design, it is easy to create a basic FAT implementation, and because of the ubiquity of MS-DOS and Windows, FAT has become useful in providing a base level of interoperability in some circumstances.

Design and implementation

A FAT file system is composed of four different sections.

The Boot Sector (aka Partition Boot Record, BIOS Parameter Block or Reserved Sector). This is always the first sector of the partition and includes some basic file system information (in particular, its type), pointers to the location of the other sections and the operating system's boot loader code.
The FAT Region. This contains two copies of the File Allocation Table for the sake of redundancy. These are maps of the partition, indicating how the clusters are allocated.
The Root Directory Region. This is a Directory Table that stores information about the files and directories in the root directory. With FAT32 it can be stored anywhere in the partition, however with earlier versions it is always located immediately after the FAT Region.
The Data Region. This is where the actual file and directory data is stored and takes up most of the partition. The size of files and subdirectories can be increased arbitrarily (as long as there are free clusters) by simply adding more links to the file's chain in the FAT. Note however, that each cluster can be taken only by one file, and so if a 1KB file resides in a 32KB cluster, 31KB are wasted.

File Allocation Table

A partition is divided up into identically sized clusters, small blocks of contiguous space. Cluster sizes vary depending on the type of FAT file system being used and the size of the partition, typically cluster sizes lie somewhere between 2KB and 32KB. Each file may occupy one or more of these clusters depending on its size, a file is represented by a chain of these clusters (referred to as a singly linked list). However these chains are not necessarily stored adjacently on the disk's surface but are often instead fragmented throughout the Data Region.

The File Allocation Table (FAT) is a list of entries that map to each cluster on the partition. Each entry records one of five things:

the address of the next cluster in a chain
a special end of file (EOF) character that indicates the end of a chain
a special character to mark a bad cluster
a special character to mark a reserved cluster
a zero to note that that cluster is unused

Each version of the FAT file system uses a different size for FAT entries. The size is indicated by the name, for example the FAT16 file system uses 16 bits for each entry while the FAT32 file system uses 32 bits. This difference means that the File Allocation Table of a FAT32 system can map a greater number of clusters than FAT16, allowing for larger partition sizes with FAT32. This also allows for more efficient use of space than FAT16, because on the same hard drive a FAT32 table can address smaller clusters which means less wasted space.

Directory Table

A Directory Table is a special type of file that represents a directory. Each file or directory stored within it is represented by a 32 byte entry in the table. Each entry records the name, extension, attributes (archive, directory, hidden, read-only, system and volume), the date and time of creation, the address of the first cluster of the file/directory's data and finally the size of the file/directory.

Aside from the Root Directory Table in FAT12 and FAT16 file systems which occupies the special Root Directory Region location, all Directory Tables are stored in the Data Region.

Long File Names (LFN) are stored on a FAT32 file system using a trick - adding phoney entries into the Directory Tables. The entries are marked with a Volume Label attribute which is impossible for a regular file and because of that they're ignored by most old MS-DOS programs. Each phoney entry can contain up to 13 UCS-2 characters (26 bytes), gaining about 15 bytes in addition to the old 8 + 3 by using fields in the record which contained file size or time stamps (but for security versus disk checking tools the starting cluster field is left unused with a 0 value). See 8.3 for additional explanations.

Versions and history

FAT as it is known today made its debut in 1980 with the first version of QDOS, the ancestor of Microsoft's PC-DOS and MS-DOS. Since FAT was mainly intended for use on floppy disks 12-bit cluster numbers were employed for space reasons, resulting in a maximum of approximately 2¹²-16 = 4,080 clusters. Clusters at that time were usually 512 bytes big (a single standard sector), giving a theoretical limit of 2 MiB--far more than a floppy could hold at the time, which was 320-400 KB (usually 360). File names were built according to the 8 + 3 pattern. This initial version of FAT became known as FAT12.

In 1983 MS-DOS version 2 added support for subdirectories and employed 16-bit cluster numbers, for use with hard disk drives. That way, there could be up to approximately 2¹⁶-18 = 65,518 clusters. Even with 512-byte clusters, this could give up to 32 MB - enough for XT hard drives either 10 or 20 MB in size. This version of FAT is known simply as FAT, or FAT16 (in the newer documents).

As hard drives grew in size, however, 512-byte clusters were not enough and bigger sizes were used. 65,518 8,192-byte clusters (512 MiB) were sufficient for the 400 MB disks in general use at the time of Windows 95's release. However, one of the user experience goals for the designers of Windows 95 was the use of long file names in the new operating system. These were implemented using a work-around in the way directory entries are laid out (see above). The new version of the file system became known as VFAT (Virtual FAT), after its Windows 95 VxD device driver. VFAT is supported by Windows 95 and above and Windows NT 4.0 and above.

By 1997, the cluster growth possibility was exhausted. The largest cluster size in Windows FAT was 32,768 bytes, giving a maximum volume size 2 gigabytes (GiB). Microsoft decided to implement a newer generation of FAT, with 32-bit cluster numbers, of which 28 bits are currently used. In theory, this should support a total of approximately 2²⁸-18 = 268,435,438 clusters, allowing for drive sizes in the multi-terabyte range. However, due to limitations in Microsoft's ScanDisk utility, the FAT is not allowed to grow beyond 4,177,920 clusters, placing the volume limit at 124.55 GiB [1]. This new FAT version became known as FAT32. First supported in Windows 95 OSR 2 and Windows 2000, and incorporating several changes to other filesystem structures, this was a major improvement over previous versions, but is no longer sufficient to cope with today's largest drives.

The alternative IBM PC operating systems, for example OS/2, Linux, FreeBSD, and BeOS, have all supported FAT, and most have gained support for VFAT and FAT32 shortly after the appropriate Windows versions were released. Early Linux distributions also supported a format known as UMSDOS, which was nothing more than FAT with the UNIX file properties (e.g. long file name and access permissions) stored in a separate file called --linux-.---. UMSDOS was mostly dropped after VFAT was released, although it still remains in the Linux kernel sources.

Since Microsoft has announced the discontinuation of the DOS line of succession with Windows ME, it remains unlikely that any new versions of FAT will appear. For most purposes, the NTFS file system that was developed for the Windows NT line is superior to FAT from the points of view of efficiency, performance and reliability. However, FAT is likely to stay for a long time as it is an ideal file system for small drives, like the floppies.

FAT licensing

Although technical details of the FAT file system derive from CP/M and have been widely known and widely disseminated in the PC community for many years, and although the file system itself is widely considered to be obsolete, it should be noted that in 2003 Microsoft made a point of asserting intellectual property claims to the system. Citing patents, Microsoft claimed that licensing fees are required for its use in such applications as removable solid state media, and consumer devices using such media.

Controversy over licensing

This claim by Microsoft is controversial, for a number of reasons. Microsoft itself has admitted that it developed its first FAT file system in 1976, so no patents could apply to an implementation of the original version. Copyright law could not prevent a re-implementation of FAT either.

Microsoft has cited 4 patents, dating from 1995 on, for FAT. All of these patents relate to storing both long and short filenames in a single file system. A Slashdot discussion on December 4, 2003, analyzed these patents and participants (particularly Svartalf) reported the following:

Patent 5,579,517 - Common name space for long and short filenames. Filed for on April 24, 1995. This appears to only impact someone using a Common Name Space for long and short filenames. This is the scheme Microsoft deployed for "Chicago" (the codename of Windows 95 while in development). This patent is likely to be invalidated (if challenged) by Microsoft's prior art release of Chicago to the world in December 1993.

Patent 5,745,902 - Method and system for accessing a file using file names having different file name formats. Filed for on July 6, 1992. This patents allowing renaming of just the name and preserving the extension for the purposes of keeping track of the filetype. It is unclear that other implementors would need to implement this patent to implement a FAT system.
Patent 5,758,352 - Common name space for long and short filenames. Filed on September 5, 1996. This is extremely similar to the 5,579,517 Patent; see those comments.

Patent 6,286,013 - Method and system for providing a common name space for long and short file names in an operating system. Filed on January 28, 1997. This is a detailed description of how Windows 95/98/Me handles long filenames on an x86-32 platform. It is unclear this patent would apply to anything other than an exact clone of Windows 95/98/Me. This patent is likely to be invalidated (if challenged) by Microsoft's prior art release of Windows 95, two years before the filing date,

In addition, there also seems to be prior art for at least the first, third, and fourth patents in the Rock Ridge Interchange Protocol standard for UNIX, which was an IEEE draft specification on at least July 13th, 1993. This protocol defined a method to support long and short names on the same media (as well as additional information) to support Unix systems.

And of course, there is the simple question if this is really an innovative idea at all. Patents are supposed to be granted for new ideas; the notion of not removing short names, but adding long name information as well, is an option that is immediately obvious to any practitioner in the field. Indeed, the standard technique for adding new fields (where backward compatibility is important) is to add new fields with the information, placing them wherever they fit, while not removing the old ones. And there's nothing clever about how the new data is stored, according to most practioners who have examined it.

In short, many have concluded that these patents only cover implementations that include support for long filenames, so it's likely that removable solid state media and consumer devices only using short names would be unaffected. Also, there is good reason to believe the patents would be found invalid or unnecessary if challenged. In particular, many or all such patents are believed by many to be invalid because of previous public release, prior art, or because the technique would be obvious to a practitioner.

In addition, there is some reason to believe that none of this licensing could impact operating system or firmware implementations. In the document "Microsoft Extensible Firmware Initiative FAT 32 File System Specification, FAT: General Overview of On-Disk Format" by Microsoft (version 1.03, December 6, 2000), Microsoft specifically grants a number of rights, and many readers of that document have interpreted it as permitting operating system vendors to implement FAT.

Re-opened investigation

Many see the Microsoft patents as wrongly issued and unsound, and believe that Microsoft has applied for many patents only as a move in its fight against open source. As a result, many petitioned the US Patent and Trade Office (PTO) to re-examine the patent. In particular, the Public Patent Foundation submitted a large set of prior art, including information from Xerox and IBM. The US PTO acknowledged that the evidence raised "substantial new question[s] of patentability", and opened an investigation into the validity of Microsoft's FAT patents. Patents are rejected in about 70% of all re-examinations, so it is quite possible that these patents will be eventually rejected.

See also: File system, Drive letter assignment, Software patent, as well as entries for competing file systems such as NTFS, EXT2, EXT3, and ReiserFS.

External link

@@ Line 6: / Line 6: @@
-== FAT file system structure ==
+== Design and implementation ==
-A FAT file system on a [[partition (IBM PC)|partition]] consists of the following parts:
+A FAT file system is composed of four different sections.
-# The ''partition boot record'', consisting of one or several sectors in the beginning of the partition, which include some basic file system information (in particular, its type), and the operating system's boot loader code
-# The ''File Allocation Table'' is a [[linked list]] that stores the logical order of the disk's ''clusters''. Clusters are small blocks of contiguous and related space in the data area. Clusters are tied into chains that represent files. These chains are not necessarily stored adjacently on the disk's surface but are instead often ''fragmented'' throughout the data area. Each file record, or inode, (discussed below) contains the location of the file's first cluster. Each entry in the FAT contains the location of the next logical cluster. The number of bits used to address the clusters determine the ''FAT size''. FAT sizes are 12, 16 and 32 bits. The FAT actually exists as one or more (usually 2) identical copies, kept as redundancies to prevent accidental data loss, although it appears that the OS itself does not use the extra entries.
-# The ''root directory'' for the partition (which has a maximal size, usually 512 entries on hard drives, written inside the boot record) contains records for files and subdirectories inside it. The records, or inodes, for each file include names, first disk cluster, creation dates, file attributes and so on.
-# The ''data area'' is where file data and inodes (subdirectories) are stored and takes up most of the partition. It is divided into small units of space called clusters. The size of files and subdirectories can be increased arbitrarily (as long as there are free clusters) by simply adding more links to the file's chain in the FAT. Note, however, that each cluster can be taken only by one file, and so if a 20-byte file resides in a 32,768-byte cluster, 32,748 bytes are wasted.
+# The [[Boot sector|Boot Sector]] (aka ''Partition Boot Record'', ''BIOS Parameter Block'' or ''Reserved Sector'').  This is always the first sector of the partition and includes some basic file system information (in particular, its type), pointers to the location of the other sections and the operating system's boot loader code.
-In order to store long file names (LFN) on a FAT file system, [[Windows 95]] and above use a trick - adding phoney records into the directory tables. The records are marked with a Volume Label attribute which is impossible for a regular file and because of that they're ignored by most old MS-DOS programs. Each phoney record can contain up to 13 [[UCS-2]] characters (26 bytes), gaining about 15 bytes in addition to the old 8 + 3 by using fields in the record which contained file size or time stamps (but for security versus disk checking tools the starting cluster field is left unused with a 0 value). See [[8.3 (computing)|8.3]] for additional explanations.
+# The '''FAT Region'''.  This contains two copies of the ''File Allocation Table'' for the sake of redundancy.  These are maps of the partition, indicating how the clusters are allocated.
+# The '''Root Directory Region'''.  This is a ''Directory Table'' that stores information about the files and directories in the root directory.  With FAT32 it can be stored anywhere in the partition, however with earlier versions it is always located immediately after the ''FAT Region''.
+# The '''Data Region'''.  This is where the actual file and directory data is stored and takes up most of the partition.   The size of files and subdirectories can be increased arbitrarily (as long as there are free clusters) by simply adding more links to the file's chain in the FAT.  Note however, that each cluster can be taken only by one file, and so if a 1KB file resides in a 32KB cluster, 31KB are wasted.
+=== File Allocation Table ===
+A partition is divided up into identically sized '''clusters''', small blocks of contiguous space.  Cluster sizes vary depending on the type of FAT file system being used and the size of the partition, typically cluster sizes lie somewhere between 2KB and 32KB.  Each file may occupy one or more of these clusters depending on its size, a file is represented by a chain of these clusters (referred to as a [[Linked list#Singly-linked list|singly linked list]]).  However these chains are not necessarily stored adjacently on the disk's surface but are often instead ''fragmented'' throughout the Data Region.
+The '''File Allocation Table''' ('''FAT''') is a list of entries that map to each cluster on the partition.  Each entry records one of five things:
+* the address of the next cluster in a chain
+* a special ''end of file'' (''EOF'') character that indicates the end of a chain
+* a special character to mark a bad cluster
+* a special character to mark a reserved cluster
+* a zero to note that that cluster is unused
+Each version of the FAT file system uses a different size for FAT entries.  The size is indicated by the name, for example the FAT16 file system uses 16 bits for each entry while the FAT32 file system uses 32 bits.  This difference means that the File Allocation Table of a FAT32 system can map a greater number of clusters than FAT16, allowing for larger partition sizes with FAT32.  This also allows for more efficient use of space than FAT16, because on the same hard drive a FAT32 table can address smaller clusters which means less wasted space.
+=== Directory Table ===
+A '''Directory Table''' is a special type of file that represents a directory.  Each file or directory stored within it is represented by a 32 byte entry in the table.  Each entry records the name, extension, attributes (archive, directory, hidden, read-only, system and volume), the date and time of creation, the address of the first cluster of the file/directory's data and finally the size of the file/directory.
+Aside from the Root Directory Table in FAT12 and FAT16 file systems which occupies the special ''Root Directory Region'' location, all Directory Tables are stored in the Data Region.
+Long File Names (LFN) are stored on a FAT32 file system using a trick - adding phoney entries into the Directory Tables. The entries are marked with a Volume Label attribute which is impossible for a regular file and because of that they're ignored by most old MS-DOS programs. Each phoney entry can contain up to 13 [[UCS-2]] characters (26 bytes), gaining about 15 bytes in addition to the old 8 + 3 by using fields in the record which contained file size or time stamps (but for security versus disk checking tools the starting cluster field is left unused with a 0 value). See [[8.3 (computing)|8.3]] for additional explanations.
 == Versions and history ==