Jump to content

Talk:Content similarity detection

Page contents not supported in other languages.
From Wikipedia, the free encyclopedia

This is an old revision of this page, as edited by 87.194.16.60 (talk) at 16:12, 13 May 2007. The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Suggested improvements

There is a large body of academic research in this area.

This page needs a major rewrite to include references.

The structure is fine as an outline.

The following improvements should be made:

First, this section is related to plagiarism detection in natural language, e.g. English. It is not related to detection in other areas, e.g. computer source code, sheet music, diagrams.

Search engines - these are ineffective as they cannot find text in a private database, e.g. a protected forum or an electronic archive of research articles. Detection software - there is terminology in the research literature to describe different categorisations of software, which should be used here. Detection algorithms - there are many proposed algorithms and comparative reviews of them exist. There is no reason why one algorithm should be singled out for inclusion.

Source code plagiarism detection - different methods are used for detecting plagiarism in computer source code - and this is again a substantial research area that needs to be referenced. Source code detection engines were developed before those for natural language and informed the development of natural language engines.

There are detection methods, other than just looking for exact text, which can be used for plagiarism detection. For instance, the Glatt detection method, or software that looks for changes in writing style within a document.

There is also a popular research area on plagiarism prevention. This involves designing out opportunities for students to plagiarise by using new/improved methods of assessment (e.g. one, perhaps draconian, example is to replace all courseworks with examination.