Jump to content

Maildir

From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by 128.89.73.37 (talk) at 17:09, 25 June 2009 (→‎Windows software). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Maildir is a widely-used format for storing e-mail that does not require application-level file locking to maintain message integrity as messages are added, moved and deleted. Each message is kept in a separate file with a unique name. All changes are made using atomic filesystem operations so that the filesystem handles file locking concurrency issues. A Maildir is a directory (often named Maildir) with three subdirectories named tmp, new, and cur.

Specifications

The Maildir concept is remarkable for being both simple and maintenance-free, despite being of vital functionality for large numbers of users.

Maildir

Daniel J. Bernstein, the author of qmail, djbdns, and various other software wrote the original and only Maildir specification.[1] There have been no followups since by Bernstein and no effort to turn Maildir into a standard. The specification was written specifically for Bernstein's qmail, and is general enough to be implemented in many programs. Over time and many independent implementations, a small number of shortcomings have been discovered.[citation needed]

Maildir++

Sam Varshavchik, the author of the Courier Mail Server and other software, wrote an extension[2] to the Maildir format called Maildir++ to support subfolders and mail quotas. Maildir++ directories contain subdirectories with names that start with a '.' (dot) that are also Maildir++ folders. This extension is therefore a violation of the Maildir specification, which provides an exhaustive list of the possible contents of a Maildir, however it is a compatible violation and other Maildir software supports Maildir++.

Problem space addressed by Maildir

Mail needs to be stored in these circumstances:

  • By an SMTP MTA, after receiving from a remote mail server and while it is waiting to be delivered elsewhere. The storage area used by the MTA is often called a spool.
  • By an IMAP mailstore, which serves email to mail client software (MUAs).
  • In a local user account where the user can read email using an MUA that reads the mail data directly rather than via a network protocol.
  • In other storage and processing situations, such as when filtering spam.

RFC822 and related standards define email messages to consist of lines of text, with strict rules concerning the first lines of text. This matches the idea of a file very well. Maildir, with its one file per message design, matches precisely what can be seen by watching SMTP email transiting a network by means of protocols such as SMTP. An MTA typically processes batches of email in a sequential access manner, so again message-per-file is a good match.

A directory containing many files each containing one message is not sufficient on its own for a mailstore or other circumstance requiring random access to email. Many implementors use a database because it is designed for indexing and searching. In 2007, filesystems usually give much better access times than databases,[citation needed] so the questions facing implementors come down to indexing methods and programming convenience versus speed, efficiency, reuse of existing technology and reliability. The Cyrus IMAP server, the MH Message Handling System, the Dovecot IMAP server and the UW IMAP server all have private, mutually incompatible file-per-message storage formats with associated indexing schemes. (Dovecot and UW IMAP also implement formats that can be accessed by other software.)

Technical operation

The process that delivers an e-mail message writes it to a file in the tmp directory with a unique filename. The current algorithm for generating the unique filename combines the time, the host name, and a number of pseudo-random parameters to ensure uniqueness.[1]

The delivery process stores the message in the maildir by creating and writing to tmp/unique, and then moving this file to new/unique. The moving is commonly done by hard linking the file to new and then unlinking the file from tmp, but some implementations simply rename() it there. This sequence guarantees that a maildir-reading program will not see a partially-written message, as MUAs never look in tmp.

When the mail user agent process finds messages in the new directory it moves them to cur (using rename() - link then unlink strategy may result in having the message duplicated) and appends an informational suffix to the filename before reading them. The information suffix consists of a colon (to separate the unique part of the filename from the actual information), a '2', a comma and various flags. The '2' specifies, loosely speaking, the version of the information that follows the comma. '2' is the only currently officially specified version, '1' being an experimental version. One can only assume that it was used while the Maildir format was under development. The specification defines flags "P", "R", "S", "T", "D" and "F",[1], while dovecot uses lowercase letters to match 26 IMAP keywords,[3] which may include standardised keywords such as $MDNSent, and user defined flags.

Issues with lockless operation

Daniel J. Bernstein designed Maildir to be safely writable by multiple concurrent writers without any form of locking, even over NFS. To a large extent this works pretty well, but he did not take into account the real world limitations of today's filesystems. The problem is that after directory listing is started with the initial readdir() system call, any files that are renamed before the last readdir() call might not show up in the listing at all. This causes the listing process to believe that the message was deleted, while in reality only its flags were changed. When the process gets around to listing the messages again, the "deleted" message suddenly appears again. Some mail-accessing programs layer their own locking on top of Maildir in an attempt to prevent these kind of problems. Dovecot, for example, uses its own non-standard locking with Maildir.

Mac OS X with HFS Plus (but not with ZFS) appears to avoid this problem for some reason[citation needed]. This issue can also be avoided with Linux by listening for changes in the Maildir with inotify and after readdiring see if inotify reported any new files[citation needed].

Issues noted by the Dovecot project

The Dovecot project, which implements an IMAP/POP3 server with built-in Maildir support has put forward some additional issues with the Maildir specification.[3] This critique portrays the Maildir delivery protocol as involving the following four steps (annotations from the Dovecot critique removed):

  1. Create a unique filename.
  2. Do stat(tmp/<filename>). If the stat() found a file, wait 2 seconds and go back to step 1.
  3. Create and write the message to the tmp/<filename>.
  4. link() it into new/ directory.

The critique then states that:

Only the first step is what really guarantees that the mails won't get overwritten, the rest just sounds nice. Even though they might catch a problem once in a while, they give no guaranteed protection ...

for these technical reasons:

  1. Step 2 is pointless, because there is a race condition between steps 2 and 3; some other thread of execution could create the file right after stat(tmp/<filename>) says the file doesn't exist.
  2. Step 2 is pointless, because the PID/host combination in the unique name should, by itself, already guarantee that it never finds such a file.
  3. In step 4 the link() will succeed at writing a file already delivered to the maildir, since a mail reader might have already moved the original copy to the cur/ directory.

This analysis misconstrues the purpose of step 2. There would indeed be a race condition if step 2 were intended to mitigate a situation where a badly behaved program creates threads of execution that could race with each other, or where a badly behaved operating environment creates concurrent processes with the same PID which could race against each other. The analysis is correct in asserting that step 1 constitutes the primary uniqueness guarantee on a continuously operating host with a monotonic system clock, a condition which it fails to note. In the situation where the maildir host is rebooted, it is not impossible that due to system clock recalibration or a misconfigured system clock that a unique filename that exists within the maildir from a previous uptime interval is generated again. Step 2 ensures that under this condition (rare on a stable mail system) a new mail item will not clobber an existing mail item existing in new/ directory (if it was already moved to cur/ it is not caught, and if it only exists in tmp/ it does not matter if it gets overwritten).

The criticism about link() possibly succeeding on a duplicated filename where the duplicate has already been moved to cur/ has a certain validity. In the case of a non-monotonic system clock, it is possible for D.J. Bernstein's Maildir delivery protocol to inject the same filename more than once into different areas within a single Maildir directory tree. However, negative system clock skews are a rare event on a stable mail system, and the randomization of the PID further stacks the deck against this eventuality.

See the article on the Network Time Protocol for further information on system clock synchronization and issues which can lead to a misbehaved system clock.

Software that supports Maildir directly

Mail servers

Delivery agents

Mail readers

Mail index and search tools

Software that supports Maildir by implication

The list of software that can be used with Maildir is in fact much larger if you consider how this software can be plugged together, and the role of network access protocols.

For example:

  • The Sendmail MTA does not support any mail delivery format (although many assume that it does). Sendmail uses a separate delivery process called mail.local. Procmail (and other programs that support Maildir) can be used in place of mail.local, so Sendmail can rightly be said to support Maildir as much as it supports any other format.
  • Many mail readers do not support Maildir but do support remote access formats such as IMAP. Since there are several IMAP mail stores that support Maildir, any mail reader that supports IMAP such as Microsoft Outlook, Pine, or Mozilla Thunderbird can be used to access Maildir folders.
  • Fetchmail does not support Maildir (or any local delivery format) but since it talks to an SMTP server or local delivery agent, any of those listed above can be used to deliver mail from Fetchmail to Maildirs.

Windows software

The Maildir standard cannot be implemented without modification on systems running Microsoft Windows, which does not tolerate colons in filenames. There is no technical reason why software on Windows cannot use an alternative (such as ";", or "-") however lacking any way of updating the specification there has been no agreement on what character this should be. One Windows program may write Maildir files that are unreadable to another Windows program. There are programs that support Maildir written in languages such as Python and Perl, or which have been ported from Unix using Cygwin or other systems that could function reliably together if this issue were addressed.

Windows may also have problems supporting Maildir because rename is not atomic. [4]

Notes and references

  1. ^ a b c Daniel J. Bernstein. (1995) Using maildir format (the original specification)
  2. ^ Varshavchik, Sam (1998) Maildir++ and Maildir quotas which has the Maildir++ specification buried within it
  3. ^ a b Dovecot Wiki: maildir format
  4. ^ [1] discussion of atomic rename on stackoverflow

See also