Content similarity detection

With the advent of the Internet, it has never been easier for students to plagiarize the work of others. Many teachers are looking for efficient ways to fight plagiarism. A few solutions exist.

The use of search engines

When trying to detect plagiarism, the first idea is to use a search engine, and search for keywords or key sentences from the suspected text, hoping to find similar texts on the Internet.

This method may be useful when the student copied a whole article, but it can become fast ineffective when the plagiarist used only parts of articles or mixed different articles. Moreover, this method is quite time-consuming.

Plagiarism detection software

With the rise of this phenomenon, many software were designed to facilitate plagiarism detection.

These software range from the basic comparison of two or more documents, to the more evolved versions allowing to find plagiarized sources on the Internet. They handle a number of document formats, the main ones being Word, PDF, and HTML.

Two categories may be distinguished for these software:

the ones running on a remote server
the ones installed on the user’s computer, working locally.

The first kind of plagiarism detection software is presumably better, because it can use a formidable reference database, containing possible sources of plagiarism. Moreover, when a new document is submitted for analysis, it can be added to the database, allowing it to grow.

However, this feature may be considered by some as a violation of student copyright

Plagiarism detection algorithms

The Rabin-Karp algorithm allows to seek a substring within a text by using hashing. One of its main applications is in detection of plagiarism, where single-string searching algorithms are impractical.

Commercial Software

Free Software

CopyTracker: opensource plagiarism detection software (can compare long texts)
http://plagiarism.phys.virginia.edu/Wsoftware.html
http://cise.lsbu.ac.uk: a selection of freely available tools