Email address harvesting
The simplest method involves spammers purchasing or trading lists of email addresses from other spammers.
Another common method is the use of special software known as "harvesting bots" or "harvesters", which spider Web pages, postings on Usenet, mailing list archives, internet forums and other online sources to obtain email addresses from public data.
Spammers may also use a form of dictionary attack in order to harvest email addresses, known as a directory harvest attack, where valid email addresses at a specific domain are found by guessing email address using common usernames in email addresses at that domain. For example, trying firstname.lastname@example.org, email@example.com, firstname.lastname@example.org, etc. and any that are accepted for delivery by the recipient email server, instead of rejected, are added to the list of theoretically valid email addresses for that domain.
Another method of email address harvesting is to offer a product or service free of charge as long as the user provides a valid email address, and then use the addresses collected from users as spam targets. Common products and services offered are jokes of the day, daily bible quotes, news or stock alerts, free merchandise, or even registered sex offender alerts for one's area. Another technique was used in late 2007 by the company iDate, which used email harvesting directed at subscribers to the Quechup website to spam the victim's friends and contacts.
Relationship to spam activity
Spam differs from other forms of direct marketing in many ways, one of them being that it costs little more to send to a larger number of recipients than a smaller number. For this reason, there is little pressure upon spammers to limit the number of addresses targeted in a spam run, or to restrict it to persons likely to be interested. One consequence of this fact is that many people receive spam written in languages they cannot read — a good deal of spam sent to English-speaking recipients is in Chinese or Korean, for instance.
Lists of addresses sold for use in spam frequently contain malformed addresses, duplicate addresses, and addresses of role accounts such as postmaster.
Spammers may harvest email addresses from a number of sources. A popular method uses email addresses which their owners have published for other purposes. Usenet posts, especially those in archives such as Google Groups, frequently yield addresses. Simply searching the Web for pages with addresses — such as corporate staff directories or membership lists of professional societies — using spambots can yield thousands of addresses, most of them deliverable. Spammers have also subscribed to discussion mailing lists for the purpose of gathering the addresses of posters. The DNS and WHOIS systems require the publication of technical contact information for all Internet domains; spammers have illegally trawled these resources for email addresses. Spammers have also concluded that generally, for the domain names of businesses, all of the email addresses will follow the same basic pattern and thus are able to accurately guess the email addresses of employees whose addresses they have not harvested. Many spammers use programs called web spiders to find email addresses on web pages. Usenet article message-IDs often look enough like email addresses that they are harvested as well. Spammers have also harvested email addresses directly from Google search results, without actually spidering the websites found in the search.
Spammer viruses may include a function which scans the victimized computer's disk drives (and possibly its network interfaces) for email addresses. These scanners discover email addresses which have never been exposed on the Web or in Whois. A compromised computer located on a shared network segment may capture email addresses from traffic addressed to its network neighbors. The harvested addresses are then returned to the spammer through the bot-net created by the virus.
A recent, controversial tactic, called "e-pending", involves the appending of email addresses to direct-marketing databases. Direct marketers normally obtain lists of prospects from sources such as magazine subscriptions and customer lists. By searching the Web and other resources for email addresses corresponding to the names and street addresses in their records, direct marketers can send targeted spam email. However, as with most spammer "targeting", this is imprecise; users have reported, for instance, receiving solicitations to mortgage their house at a specific street address — with the address being clearly a business address including mail stop and office number.
Spammers sometimes use various means to confirm addresses as deliverable. For instance, including a hidden Web bug in a spam message written in HTML may cause the recipient's mail client to transmit the recipient's address, or any other unique key, to the spammer's Web site. Users can defend against such abuses by turning off their mail program's option to display images, or by reading email as plain-text rather than formatted.
Likewise, spammers sometimes operate Web pages which purport to remove submitted addresses from spam lists. In several cases, these have been found to subscribe the entered addresses to receive more spam.
When persons fill out a form, it is often sold to a spammer using a web service or http post to transfer the data. This is immediate and will drop the email in various spammer databases. The revenue made from the spammer is shared with the source. For instance, if someone applies online for a mortgage, the owner of this site may have made a deal with a spammer to sell the address. These are considered the best emails by spammers, because they are fresh and the user has just signed up for a product or service that often is marketed by spam.
In Australia, the creation or use of email-address harvesting programs (address harvesting software) is illegal, according to the 2003 anti-spam legislation, only if you intend to use the email-address harvesting programs to send unsolicited commercial email. The legislation is intended to prohibit emails with 'an Australian connection' - spam originating in Australia being sent elsewhere, and spam being sent to an Australian address. New Zealand has similar restrictions contained in its Unsolcitied Electronic Messages Act 2007.
- Using an automated means that generates possible electronic mail addresses by combining names, letters, or numbers into numerous permutations.
- Using an automated means to extract electronic mail addresses from an Internet website or proprietary online service operated by another person, and such website or online service included, at the time the address was obtained, a notice stating that the operator of such website or online service will not give, sell, or otherwise transfer addresses maintained by such website or online service to any other party for the purposes of initiating, or enabling others to initiate, electronic mail messages.
Furthermore, website operators may not distribute their legitimately collected lists. The CAN-SPAM Act of 2003 requires that operators of web sites and online services should include a notice that the site or service will not give, sell, or otherwise transfer addresses, maintained by such website or online service, to any other party for the purposes of initiating, or enabling others to initiate, electronic mail messages.
- Address munging
- Address munging—e.g., changing "email@example.com" to "bob at example dot com"—is a common technique to make harvesting email addresses more difficult. Though relatively easy to overcome—see, e.g., this Google search—it is still effective. It is somewhat inconvenient to users, who must examine the address and manually correct it.
- Using images to display part or all of an email address is a very effective harvesting countermeasure. The processing required to automatically extract text from images is not economically viable for spammers. It is very inconvenient for users, who must manually launch their email client and transcribe the address.
- Contact forms
- Email contact forms which send an email but do not reveal the recipient's address avoid publishing an email address in the first place. Insecure forms, however, may actually aid spammers by effectively serving as an open mail relay. This method prevents users from composing in their preferred client and limits message content to plain text.
- HTML obfuscation
- In HTML, email addresses may be obfuscated in many ways, such as inserting hidden elements within the address or listing parts out of order and using CSS to restore the correct order. Each has the benefit of being transparent to most users, but none support clickable email links and none are accessible to text-based browsers and screen readers.
- Requiring users to complete a CAPTCHA before giving out an email address is an effective harvesting countermeasure. A popular solution is the reCAPTCHA Mailhide service.
- CAN-SPAM Notice
- To enable prosecution of spammers under the CAN-SPAM Act of 2003, a website operator must post a notice that "the site or service will not give, sell, or otherwise transfer addresses maintained by such website or online service to any other party for the purposes of initiating, or enabling others to initiate, electronic mail messages."
- Mail Server Monitoring
- A method that can be implemented at the recipient email server for combatting directory harvesting attacks is to reject all email addresses as invalid from any sender that has specified more than one invalid recipient address; however, this carries a risk of legitimate email being blocked too.
- Spider Traps
- A spider trap is a part of a website which is a honeypot designed to combat email harvesting spiders. Well-behaved spiders are unaffected, as the website's robots.txt file will warn spiders to stay away from that area—a warning that malicious spiders do not heed. Some traps block access from the client's IP as soon as the trap is accessed. Others, like a network tarpit, are designed to waste the time and resources of malicious spiders by slowly and endlessly feeding the spider useless information. The "bait" content may contain large numbers of fake addresses, a technique known as list poisoning, though some consider this practice harmful.
- Arthur, Charls (2007-09-13). "Do social network sites genuinely care about privacy?". theguardian. Retrieved 2007-10-30.
- Rejo Zenger (25 December 2005). "what you get when you buy a spam CD". rejo.zenger.nl. Retrieved 2007-01-06.
- Heather Harreld (5 December 2000). "Embedded HTML 'bugs' pose potential security risk". InfoWorld. Archived from the original on 2006-12-10. Retrieved 2007-01-06.
- "Spam Unsubscribe Services". The Spamhaus Project Ltd. 29 September 2005. Retrieved 2007-01-06.
- Silvan Mühlemann, 20 July 2008, Nine ways to obfuscate e-mail addresses compared
- Hohlfeld, Oliver; Graf, Thomas; Ciucu, Florin (2012). Longtime Behavior of Harvesting Spam Bots (PDF). ACM Internet Measurement Conference.
- Roel Van Gils, A List Apart, 6 November 2007, Graceful Email Obfuscation
- Ton van Hattum, 2 October 2009, Email Address on Your Site, SPAM Protection, Encrypting
- Mailhide: Free Spam Protection
- SEO Glossary: "A spider trap refers to either a continuous loop where spiders are requesting pages and the server is requesting data to render the page or an intentional scheme designed to identify (and "ban") spiders that do not respect robots.txt."
-  A Spider Trap which bans clients which access it.
- Thomas Zeithaml, Spider Trap: How It Works
- Ralf D. Kloth, Trap bad bots in a bot trap
- How to keep bad robots, spiders and web crawlers away
- Harvester Killer: generates fake emails and traps spiders in an endless loop.
-  A Spider Trap which generates 5,000 fake email addresses and blocks the client from further access.
- robotcop.org: "Webmasters can respond to misbehaving spiders by trapping them, poisoning their databases of harvested e-mail addresses, or simply block them."
- Ralf D. Kloth, Fight SPAM, catch Bad Bots: "Generating web pages with long lists of fake addresses to spoil the spam bot's address data base is not encouraged, because it is unknown if the spammers really care and on the other hand, the use of those addresses by spammers will cause additional traffic load on network links and involved innocent third party servers."