Maildir: Difference between revisions

Content deleted Content added

Inline

Revision as of 10:53, 19 July 2022

The Maildir e-mail format is a common way of storing email messages in which each message is stored in a separate file with a unique name, and each mail folder is a file system directory. The local file system handles file locking as messages are added, moved and deleted. A major design goal of Maildir is to eliminate the need for program code to handle file locking and unlocking.^[1]

Specifications

A Maildir directory (often named Maildir) usually has three subdirectories named tmp, new, and cur.

The tmp subdirectory temporarily stores e-mail messages that are in the process of being delivered. This subdirectory may also store other kinds of temporary files. The new subdirectory stores messages that have been delivered, but have not yet been seen by any mail application. The cur subdirectory stores messages that have already been seen by mail applications.^[2]

Maildir++

Sam Varshavchik, the author of the Courier Mail Server and other software, wrote an extension^[2]^[3] to the Maildir format called Maildir++ to support subfolders and mail quotas. Maildir++ directories contain subdirectories with names that start with a '.' (dot) that are also Maildir++ folders. This extension complies with the Maildir specification, which explicitly provides for the possibility to add more than tmp, new, cur to a maildir.

Technical operation

A mail delivery agent is a program that delivers an email message into a Maildir. The mail delivery agent creates a new file with a unique filename in the tmp directory.^[4]^[5]^[2] The original algorithm circa 1995 for generating unique filenames, as implemented by qmail^[1] was:

read the current Unix time
read the current process identifier (PID)
read the current hostname
concatenate the above three values into a string separated by the period character; this is the new filename
if stat() reports that the filename exists, then wait two seconds
go to previous step until the filename does not exist
create a file with the unique filename and write the message contents to the new file

By 2000, the author of qmail recommended to append the value of a per-process counter to the PID, whose value should be incremented after each delivery, and the rate-limiting suggestion had been dropped.^[4]

By 2003, the recommendations had been further amended to require that instead of the PID and counter, the middle part of the filename should be created by "concatenating enough of the following strings to guarantee uniqueness" even in the face of multiple simultaneous deliveries to the same maildir from one or more processes:^[6]

#n, where n is (in hexadecimal) the output of the operating system's unix_sequencenumber() system call, which returns a number that increases by 1 every time it is called, starting from 0 after reboot.

Xn, where n is (in hexadecimal) the output of the operating system's unix_bootnumber() system call, which reports the number of times that the system has been booted. Together with #, this guarantees uniqueness; unfortunately, most operating systems don't support unix_sequencenumber() and unix_bootnumber().

Rn, where n is (in hexadecimal) the output of the operating system's unix_cryptorandomnumber() system call or an equivalent source, such as /dev/urandom. Unfortunately, some operating systems don't include cryptographic random number generators.

In, where n is (in hexadecimal) the UNIX inode number of this file. Unfortunately, inode numbers aren't always available through NFS.

Vn, where n is (in hexadecimal) the UNIX device number of this file. Unfortunately, device numbers aren't always available through NFS. (Device numbers are also not helpful with the standard UNIX filesystem: a maildir has to be within a single UNIX device for link() and rename() to work.)

Mn, where n is (in decimal) the microsecond counter from the same gettimeofday() used for the left part of the unique name.

Pn, where n is (in decimal) the process ID.

Qn, where n is (in decimal) the number of deliveries made by this process.

This algorithm was criticised in 2006 by Timo Sirainen, the creator of Dovecot, as being unnecessarily complex.^[7]

As of November 2018, qmail author Daniel Bernstein had made no further changes to these filename generation recommendations.^[8] On modern POSIX systems, temporary files can be safely created with the mkstemp C library function.

The delivery process stores the message in the maildir by creating and writing to tmp/uniquefilename, and then moving this file to new/uniquefilename. The moving can be done using rename, which is atomic in many systems.^[9] Alternatively, it can be done by hard-linking the file to new and then unlinking the file from tmp. Any leftover file will eventually be deleted. This sequence guarantees that a maildir-reading program will not see a partially written message. There can be multiple programs reading a maildir at the same time. They range from mail user agents (MUAs), which access the server's file system directly, through Internet Message Access Protocol or Post Office Protocol servers acting on behalf of remote MUAs, to utilities such as biff and rsync, which may or may not be aware of the maildir structure. Readers should never look in tmp.

When a cognizant maildir-reading process (either a POP or IMAP server, or a mail user agent acting locally) finds messages in the new directory, it must move them to cur. It is just a means to notify the user "you have X new messages".^[10] This moving needs to be done using rename(), as the non-atomic link-then-unlink technique may result in duplicated messages. An informational suffix is appended to filenames at this stage. It consists of a colon (to separate the unique part of the filename from the actual information), a "2", a comma and various flags. The "2" specifies the version of the information that follows the comma. "2" is the only currently officially specified version, "1" being an experimental version. The specification defines flags that show whether the message has been read, deleted and so on: the initial (capital) letter of "Passed", "Replied", "Seen", "Trashed", "Draft", and "Flagged".^[6] Dovecot uses lowercase letters to match 26 IMAP keywords,^[5] which may include standardised keywords, such as $MDNSent, and user-defined flags.

Although Maildir was intended to allow lockless usage, in practice some software that uses Maildirs also uses locks, such as Dovecot.^[11]

File-system compatibility issues

The Maildir standard can only be implemented on systems that accept colons in filenames.

Systems that don't allow colons in filenames (this includes Microsoft Windows and some configurations of Novell Storage Services) can use an alternative separator, such as ";" or "-". It is often trivial to patch free and open-source software to use a different separator.^[12]

As there is currently no agreement on what character this alternative separator should be, there can be interoperability difficulties between different Maildir-supporting programs on these systems. However, not all Maildir-related software needs to know what the separator character is, because not all Maildir-related software needs to be able to read or modify the flags of a message ("read", "replied to" etc.); software that merely delivers to a Maildir or archives old messages from it based only on date, should work no matter what separator is in use. If only the MUA needs to read or modify message flags, and only one MUA is used, then non-standard alternative separators may be used without interoperability problems.

Software that supports Maildir directly

Mail servers

Dovecot IMAP server
Courier Mail Server SMTP and IMAP server, for which the Maildir++ format was invented
Sendmail The original SMTP server
Exim SMTP server
Postfix SMTP server
qmail SMTP server, for which the Maildir format was invented
MeTA1 SMTP server
OpenSMTPD SMTP server

Delivery agents

procmail
Dovecot delivery agent
maildrop
getmail, a Maildir-aware mail-retrieval and delivery agent alternative to Fetchmail
fdm
OfflineIMAP
mbsync

Mail readers

aerc^[13] (efficient and extensible email client)
Balsa previously the official GNOME mail reader (prior to Evolution)
Cone a curses-based mail reader
Evolution, official GNOME mail client
GNUMail
Gnus
KMail, KDE mail reader
mailx
Mutt
Notmuch^[14] (fast, global-search and tag-based email system)
Pine/Alpine
Mozilla Thunderbird – experimental and “disabled by default because there are still many bugs” ^[15]

Notes and references

^ ^a ^b Bernstein, Daniel J. (1995). "maildir(5)". Archived from the original on 1997-10-12. Retrieved 2018-11-23.
^ ^a ^b ^c Sam Varshavchik (2009). "maildir". Retrieved 24 July 2016.
^ Sam Varshavchik (2011). "Maildir++". Retrieved 24 July 2016.
^ ^a ^b Bernstein., Daniel J. (c. 2000) [First published 2000 or earlier]. "Using maildir format". Archived from the original on 2000-09-02. Retrieved 2018-11-23.
^ ^a ^b Dovecot Wiki: maildir format
^ ^a ^b Bernstein., Daniel J. (2003) [The earliest version of this document was first published in 2000 or earlier]. "Using maildir format". Archived from the original on 2003-04-01. Retrieved 2018-11-23.
^ Sirainen, Timo (2006-12-05). "Diff for 'MailboxFormat/Maildir'". Retrieved 2018-11-23. All this trouble is rather pointless. Only the first step is what really guarantees that the mails won't get overwritten, the rest just sounds nice. Even though they might catch a problem once in a while, they give no guaranteed protection and will just as easily pass duplicate filenames through to overwrite existing mails. Step 2 is pointless because there's a race condition between steps 2 and 3. PID/host combination by itself should already guarantee that it never finds such a file. If it does, something's broken and the stat() check won't help since another process might be doing the same thing at the same time, and you end up writing to the same file in tmp/, causing the mail to get corrupted. In step 4 the link() would also fail if identical file was already in the maildir, right? Wrong. The file may already have been moved to cur/ directory, and since it may contain any number of flags by then you can't check with a simple stat() anymore if it exists or not. So really, all that's important in not getting mails overwritten in your maildir is the step 1: Always create filenames that are guaranteed to be unique. Forget about the 2 second waits and such that the Qmail's man page talks about
^ "Wayback Machine snapshots of cr.yp.to/proto/maildir.html". Internet Archive. 2018. Retrieved 2018-11-23.
^ "rename". The Open Group. 2013. Retrieved 23 July 2016. That specification requires that the action of the function be atomic.
^ Sam Varshavchik (25 July 2016). "Management of maildir structures". courier-users (Mailing list). Retrieved 26 July 2016.
^ Sirainen, Timo (2006-12-05). "Diff for 'MailboxFormat/Maildir'". Retrieved 2018-11-23.
^ mutt maildir support: workaround for filesystems that don't accept colons
^ "aerc - the world's best email client homepage". aerc-mail.org.
^ "Notmuch mail system homepage". notmuchmail.org. Retrieved 2019-06-22.
^ "Maildir in Thunderbird". mozilla.org. Retrieved 2020-12-06.

External links

[qmail-manpage-1] Bernstein, Daniel J. (1995). "maildir(5)". Archived from the original on 1997-10-12. Retrieved 2018-11-23.

[courier-maildir-2] Sam Varshavchik (2009). "maildir". Retrieved 24 July 2016.

[courier-plusplus-3] Sam Varshavchik (2011). "Maildir++". Retrieved 24 July 2016.

[djbspec-2000-4] Bernstein., Daniel J. (c. 2000) [First published 2000 or earlier]. "Using maildir format". Archived from the original on 2000-09-02. Retrieved 2018-11-23.

[dovecot-5] Dovecot Wiki: maildir format

[djbspec-2003-6] Bernstein., Daniel J. (2003) [The earliest version of this document was first published in 2000 or earlier]. "Using maildir format". Archived from the original on 2003-04-01. Retrieved 2018-11-23.

[dovecot-2006-crit-7] Sirainen, Timo (2006-12-05). "Diff for 'MailboxFormat/Maildir'". Retrieved 2018-11-23. All this trouble is rather pointless. Only the first step is what really guarantees that the mails won't get overwritten, the rest just sounds nice. Even though they might catch a problem once in a while, they give no guaranteed protection and will just as easily pass duplicate filenames through to overwrite existing mails. Step 2 is pointless because there's a race condition between steps 2 and 3. PID/host combination by itself should already guarantee that it never finds such a file. If it does, something's broken and the stat() check won't help since another process might be doing the same thing at the same time, and you end up writing to the same file in tmp/, causing the mail to get corrupted. In step 4 the link() would also fail if identical file was already in the maildir, right? Wrong. The file may already have been moved to cur/ directory, and since it may contain any number of flags by then you can't check with a simple stat() anymore if it exists or not. So really, all that's important in not getting mails overwritten in your maildir is the step 1: Always create filenames that are guaranteed to be unique. Forget about the 2 second waits and such that the Qmail's man page talks about

[djbspec-hist-8] "Wayback Machine snapshots of cr.yp.to/proto/maildir.html". Internet Archive. 2018. Retrieved 2018-11-23.

[opengroup-rename-9] "rename". The Open Group. 2013. Retrieved 23 July 2016. That specification requires that the action of the function be atomic.

[courier-msg38512-10] Sam Varshavchik (25 July 2016). "Management of maildir structures". courier-users (Mailing list). Retrieved 26 July 2016.

[dovecot-2006-crit-2-11] Sirainen, Timo (2006-12-05). "Diff for 'MailboxFormat/Maildir'". Retrieved 2018-11-23.

[mutt-colons-12] utt maildir support: workaround for filesystems that don't accept colons

[13] "aerc - the world's best email client homepage". aerc-mail.org.

[14] "Notmuch mail system homepage". notmuchmail.org. Retrieved 2019-06-22.

[thunderbird-15] "Maildir in Thunderbird". mozilla.org. Retrieved 2020-12-06.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

@@ Line 41: / Line 41: @@
 </blockquote>
-This algorithm was criticised in 2006 by [[Timo Sirainen]], the creator of [[Dovecot (software)|Dovecot]].<ref name="dovecot-2006-crit"/>
+This algorithm was criticised in 2006 by [[Timo Sirainen]], the creator of [[Dovecot (software)|Dovecot]], as being unnecessarily complex.<ref name="dovecot-2006-crit"/>
 As of November 2018, qmail author [[Daniel J. Bernstein|Daniel Bernstein]] had made no further changes to these filename generation recommendations.<ref name="djbspec-hist"/> On modern POSIX systems, [[temporary file]]s can be safely created with the <code>[[mkstemp]]</code> C library function.