Duplicate code

From Wikipedia, the free encyclopedia
  (Redirected from Code duplication)
Jump to: navigation, search

Duplicate code is a computer programming term for a sequence of source code that occurs more than once, either within a program or across different programs owned or maintained by the same entity. Duplicate code is generally considered undesirable for a number of reasons.[1] A minimum requirement is usually applied to the quantity of code that must appear in a sequence for it to be considered duplicate rather than coincidentally similar. Sequences of duplicate code are sometimes known as code clones or just clones, the automated process of finding duplications in source code is called clone detection.

The following are some of the ways in which two code sequences can be duplicates of each other:

  • character-for-character identical
  • character-for-character identical with white space characters and comments being ignored
  • token-for-token identical
  • token-for-token identical with occasional variation (i.e., insertion/deletion/modification of tokens)
  • functionally identical

How duplicates are created[edit]

There are a number of reasons why duplicate code may be created, including:

  • Copy and paste programming, or scrounging, in which a section of code is copied "because it works". In most cases this operation involves slight modifications in the cloned code such as renaming variables or inserting/deleting code.
  • Functionality that is very similar to that in another part of a program is required and a developer independently writes code that is very similar to what exists elsewhere. Studies suggest, that such independently rewritten code is typically not syntactically similar.[2]
  • Plagiarism, where code is simply copied without permission or attribution.

Problems associated with duplicate code[edit]

Inappropriate code duplication may increase maintenance costs, and may be indicative of a sloppy design. Appropriate code duplication may occur for many reasons, including facilitating the development of a device driver for a device that is similar to some existing device [3]

Detecting duplicate code[edit]

A number of different algorithms have been proposed to detect duplicate code. For example:

Example of functionally duplicate code[edit]

Consider the following code snippet for calculating the average of an array of integers

extern int array1[];
extern int array2[];
 
int sum1 = 0;
int sum2 = 0;
int average1 = 0;
int average2 = 0;
 
for (int i = 0; i < 4; i++)
{
   sum1 += array1[i];
}
average1 = sum1/4;
 
for (int i = 0; i < 4; i++)
{
   sum2 += array2[i];
}
average2 = sum2/4;

The two loops can be rewritten as the single function:

int calcAverage (int* Array_of_4)
{
   int sum = 0;
   for (int i = 0; i < 4; i++)
   {
       sum += Array_of_4[i];
   }
   return sum/4;
}

Using the above function will give source code that has no loop duplication:

extern int array1[];
extern int array2[];
 
int average1 = calcAverage(array1);
int average2 = calcAverage(array2);

See also[edit]

References[edit]

  1. ^ Spinellis, Diomidis. "The Bad Code Spotter's Guide". InformIT.com. Retrieved 2008-06-06. 
  2. ^ Code similarities beyond copy & paste by Elmar Juergens, Florian Deissenboeck, Benjamin Hummel.
  3. ^ Kapser, C.; Godfrey, M.W., ""Cloning Considered Harmful" Considered Harmful," 13th Working Conference on Reverse Engineering (WCRE), pp. 19-28, Oct. 2006
  4. ^ Brenda S. Baker. A Program for Identifying Duplicated Code. Computing Science and Statistics, 24:49–57, 1992.
  5. ^ Ira D. Baxter, et al. Clone Detection Using Abstract Syntax Trees
  6. ^ Visual Detection of Duplicated Code by Matthias Rieger, Stephane Ducasse.
  7. ^ Yuan, Y. and Guo, Y. CMCD: Count Matrix Based Code Clone Detection, in 2011 18th Asia-Pacific Software Engineering Conference. IEEE, Dec. 2011, pp. 250–257.
  8. ^ Chen, X., Wang, A. Y., & Tempero, E. D. (2014). A Replication and Reproduction of Code Clone Detection Studies. In ACSC (pp. 105-114).

External links[edit]