Jump to content

Email address: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
m Limitations: missing end angle bracket on </code> tag
Validation: it really is difficult
Line 70: Line 70:
== Validation ==
== Validation ==


Determining the validity of an e-mail address is a common and, unfortunately, difficult task. The problem arises for example in online forms, where the user is asked to enter an e-mail address to allow them to be contacted later. As this is often the only contact information available, there are good reasons for trying to ensure that the address given is indeed correct.
Determining the validity of an e-mail address is a common and, unfortunately, difficult task.<ref>http://www.hm2k.com/posts/what-is-a-valid-email-address</ref> The problem arises for example in online forms, where the user is asked to enter an e-mail address to allow them to be contacted later. As this is often the only contact information available, there are good reasons for trying to ensure that the address given is indeed correct.


In general, two types of validity checking may be desired:
In general, two types of validity checking may be desired:

Revision as of 12:13, 6 June 2008

An e-mail address identifies a location to which e-mail messages can be delivered. The term "e-mail address" is also used as the formal pre-registered authoritative electronic mail delivery site for an individual (example: an attorney's e-mail address registered for delivery of proof of service digital copies of legal pleadings). A modern Internet e-mail address (using SMTP or Usenet) is a string of the form jsmith@example.com. It should be read as "jsmith at example dot com". The part before the @ sign is the local-part of the address, often the username of the recipient, and the part after the @ sign is the domain-part which may be a host name or domain name which can be looked up in the Domain Name System to find the mail transfer agent or Mail eXchangers (MXs) accepting e-mail for that address. Some hosts allow a catch-all address where the local-part can be undefined and the email would be delivered to a configured and existing email address.

The domain name of an e-mail address is often that of the e-mail service, such as Google's Gmail, Microsoft's Hotmail, etc. The domain name can also be the domain name of the company that the recipient represents, or the domain of the recipient's personal site.

Earlier forms of e-mail addresses included the somewhat verbose notation required by X.400, and the UUCP "bang path" notation, in which the address was given in the form of a sequence of computers through which the message should be relayed. This latter was widely used for several years, but was superseded by the generally more convenient SMTP form.

Addresses found in the header fields of e-mail should not be considered authoritative, because SMTP has no generally-required mechanisms for authentication. Forged e-mail addresses are often seen in spam, phishing, and many other internet-based scams; this has led to several initiatives which aim to make such forgeries easier to spot.

To indicate where the message should go, a user normally types the "display name" of the recipient followed by the address specification surrounded by angled brackets, for example: John Smith <ap118@example.com>.

Limitations

The specification of an e-mail addresses was originally defined in RFC 822, which has since been made obsolite by RFC 2822.

The specification found in RFC 2822 only allows e-mail addresses to consist of subsets of ASCII characters.

An email address is separated into 2 parts, a "local-part" and a domain. (ie: local-part@domain).

The "local-part" of an e-mail address is defined with a maximum of 64 characters (however servers are encouraged to not limit themselves to accepting only 64 characters) and the domain name a maximum of 255 characters, as defined in RFC 2821.

According to RFC 2822, the local-part of the e-mail address may use any of these ASCII characters:

  • Uppercase and lowercase letters
  • Digits 0 through 9
  • Characters ! # $ % * / ? | ^ { } ` ~ & ' + - = _
  • Character . provided that it is not the first nor last character, nor may it appear two or more times consecutively.

Additionally, quoted-strings (ie: "John Doe"@example.com) are permitted (according to RFC 2821 and RFC 2822), thus allowing characters that would otherwise be prohibited, however they do not appear in common practice. RFC 2821 also warns that "a host that expects to receive mail SHOULD avoid defining mailboxes where the Local-part requires (or uses) the Quoted-string form".

Notwithstanding the addresses permitted by these standards, some systems impose more restrictions on email addresses, both in email addresses created on the system and in email addresses to which messages can be sent. Hotmail, for example, only allows creation of email addresses using alphanumerics, dot (.), underscore (_) and hyphen (-), and will not allow sending mail to any email address containing ! # $ % * / ? | ^ { } ` ~. [citation needed]

The domain name is much more restricted. The dot separated domain labels are limited to "letters, digits, and hyphens drawn from the ASCII character set ... Mailbox domains are not case sensitive."

The informational RFC 3696 written by the author of RFC 2821 explains the details in a readable way, with a few minor errors noted in the 3696 errata.

Examples

Valid examples

  • Abc@example.com
  • Abc.123@example.com
  • 1234567890@domain.com
  • abcd@example-one.com
  • _______@domain.com
  • user+mailbox/department=shipping@example.com
  • !#$%&'*+-/=?^_`.{|}~@example.com
  • "Abc@def"@example.com
  • "Fred Bloggs"@example.com
  • "Joe.\\Blow"@example.com

Invalid examples

  • Abc.example.com (char @ is missing)
  • Abc.@example.com (last char of local part is a dot(.))
  • Abc..123@example.com (char dot(.) is double)

Plus (or Minus) addressing

According to RFC 2821 2.3.10 Mailbox and Address, "...the local-part MUST be interpreted and assigned semantics only by the host specified in the domain part of the address.". In particular, for some hosts the user "smith" is different from the user "Smith".

Plus addressing is one of the benefits of this limitation. Some mail services allow a user to append +tag to their email address (joeuser+tag@example.com). The text of tag can be used to apply filtering.

Some systems violate RFC 2822, and the recommendations in RFC 3696, by refusing to send mail addressed to a user on another system merely because the local-part of the address contains the plus sign (+). Users of these systems cannot use plus addressing.

On the other hand, most qmail installations support the use of a dash '-' as a separator within the local-part, such as joeuser-tag@example.com or joeuser-tag-sub-anything-else@example.com. This allows qmail through .qmail-default or .qmail-tag-sub-anything-else files to sort, filter, forward, or run application based on the tagging system established. Procmail and SpamAssassin are common applications to use with qmail to help sort out spam or further filter incoming email.

Disposable addresses of this form, using various separators between the base name and tag are supported by several email services, including Runbox (plus and minus), Google Mail (plus), Yahoo! Mail Plus (minus)[1], and FastMail (plus)[2].

A related technique that is often used by mailing lists to reliably detect bounces is called variable envelope return path.

Validation

Determining the validity of an e-mail address is a common and, unfortunately, difficult task.[3] The problem arises for example in online forms, where the user is asked to enter an e-mail address to allow them to be contacted later. As this is often the only contact information available, there are good reasons for trying to ensure that the address given is indeed correct.

In general, two types of validity checking may be desired:

  1. determining whether an address is syntactically valid according to the rules above, or
  2. determining whether e-mail can actually be delivered to the address.

The former may be accomplished by parsing the address according to the syntax rules described above, and possibly subjecting the domain name part to further validity checks. Unfortunately, many widespread approaches, often based on regular expressions, tend to match only a subset of all valid addresses, potentially causing difficulties for users whose address doesn't happen to match the programmer's expectations. Often, the best approach may be to simply check for the few features that can be relied on to be present in any valid address, such as the presence of an @ sign. Such a check will accept many invalid addresses, but will hopefully at least ensure that the user has entered something that might be an e-mail address, rather than, say, their street address.

In general, the only way to reliably determine whether an address can actually receive e-mail is to send a test message to it and have the recipient confirm that they've seen it, for example by entering a randomly generated code included in the message or accessing a URL containing such a code. It's worth noting that merely the apparently successful delivery of a message without errors or bounce messages is not sufficient to guarantee validity, since many e-mail servers may accept and silently discard messages sent to nonexistent addresses.

Before attempting delivery to an address, some intermediate checks, such as querying the domain name system to ensure that the hostname in the address has a valid MX record, may be performed. Such checks are of limited use in cases where delivery would be attempted anyway if they pass, since the mail transport agent responsible for delivering the message will be doing the same checks in any case. They may, however, be useful additions to pure syntax validity checks, either in situations where the sending of test messages is considered too onerous, or as an initial check before the user has yet committed to having a test message sent to them.

In online applications based on HTML forms, validity checks may be done either on the server receiving the content after submission, or directly at the client end using client side scripting languages such as JavaScript. The latter have the advantage of being able to provide immediate feedback to the user, but are not always supported or enabled in all browsers, necessitating the implementation of redundant server-side checks. It can also be difficult to reliably implement anything more than simple syntax validity checking on the client side, due to variations in the client environment. A solution that can combine some of the advantages of both approaches is to use techniques like AJAX to have the client automatically contact the server to do the checking, allowing the same server-side code used for the final validation to be applied interactively.

References

  • RFC 2821: Simple Mail Transfer Protocol
  • RFC 2822: Internet Message Format
  • RFC 3696: Application Techniques for Checking and Transformation of Names
  • RFC 2142: Mailbox names for common services, roles and functions

Footnotes

See Also