Page hijacking is a form of search engine index spamming. It is achieved by creating a rogue copy of a popular website which shows contents similar to the original to a web crawler, but redirects web surfers to separate, unrelated or malicious websites. Spammers can use this technique to achieve high rankings in result pages for certain key words.
Page hijacking is a form of cloaking, made possible because some web crawlers detect duplicates while indexing web pages. If two pages have the same content, only one of the URLs will be kept. A spammer will try to ensure that the rogue website is the one shown on the result pages.
Suppose that a website offers difficult-to-find sizes of clothes. A common search entered to reach this website is really big t-shirts, which - when entered on popular search engines - made this website show up as the first result:
- Offering clothes in sizes you cannot find elsewhere.
A spammer working for a competing company then creates a website that looks extremely similar to the one listed when visited by a web crawler. However, it includes a special temporary redirection script that redirects regular web surfers to the competitor's site. After several weeks, a web search for really big t-shirts then shows the following result:
- Offering clothes in sizes you cannot find elsewhere... at better prices!
- —Show Similar Pages—
Notice how .com changed to .net, as well as the new "Show Similar Pages" link.
When web surfers click on this result, they are redirected to the competing website. The original result was hidden in the "Show Similar Pages" section.
- AIRWeb' 05: First Workshop on Adversarial Information Retrieval on the Web - Research on search engine spamming
- "Google Regains Its Hijacked Listing; This Was A Big Deal, Folks!". SearchEngineWatch. May 26, 2005.
- "I heard Google needs more examples of 302 hijacking (entry #5)". SearchEngineWatch. Feb 8, 2005.