User:Vrajadh/sandbox

From Wikipedia, the free encyclopedia


Data clumps refer to a code smell in computer programming code, in which groups of data items that are related are always passed around together. They are often primitive values that ideally should have been converted into objects. They are created due to bad program structuring or not following object oriented principles. They can also be a result of excessive use of "copy and paste programming".

Description[edit]

Code refactoring is the process of improving the efficiency of code without affecting its functionality. An important part of this process is identifying code smells. According to Martin Fowler[1], "a code smell is a surface indication that usually corresponds to a deeper problem in the system". Data clumps is an example of a code smell in which two or three data items are passed around together in a program(eg. start and end variable or length, breadth and height of an object). Such recurrence of items leads to duplication of code. An element of abstraction is missing from the code making it difficult to understand.

Issues and Identification[edit]

The major downside to not removing Data Clumps from the code is re-usability. Since Data Clumps refer to related data that goes together, any change made to one such data item may not be reflected in other places of its usage (e.g., length and breadth of a rectangle). If for some reason the value of one of the dimensions needs to be changed, it will have to be changed everywhere else where it is being used. If these dimensions are packed in an object, a call to the Mutator method would reflect the changes in the system.

In some cases, there are 10-15 parameters that go together and passing them from one method to another results in ugly code. Such code is unreadable and hard to understand for the reader .It is also difficult to maintain. A general rule of thumb to identify Data Clumps is that when you see a few parameters that are repeatedly being passed around in groups, try to delete one of them and check whether the remaining parameters still make any sense or not[2]. If they don't, then there exists a Data Clump.

Refactoring Code[edit]

Extract Class[edit]

In this refactoring technique, we break down a large class into multiple small classes. This results in maintaining of the single responsibility principle. Classes adhering to the single responsibility principle are reliable and tolerant to changes. Data clumps are avoided as the different data items can be passed as an object of the class.

Java Example[edit]

Below is an example of a class Person in java. The instance variables officeAreaCode and officeNumber are currently a part of it. They would travel around as data clumps in the code.

class Person { private String name;
private String name;
private String officeAreaCode; 
private String officeNumber;

public String getOfficeAreaCode() { return officeAreaCode;
}
public void setOfficeAreaCode(String arg) { officeAreaCode = arg;
}
//other getters and setters
}

Below the potential data clumps are put into a separate class. An object of this class is created in the Person class to link the two together. Thus the data can now be passed around the system as an object.

class TelephoneNumber{
    private String areaCode;
    private String number;
    
    //getters and setters
}


class Person { 
    private String name;
    
private TelephoneNumber telephone=new TelephoneNumber();

public String getAreaCode(){
    return telephone.getAreaCode();
}

//other getters and setters
...

Parameter Object[edit]

Whenever repeating group of data items are encountered in the form of parameters they can be packed into an object.[3] This helps in avoiding code duplication. As an example, we have an application that gets the information about a person and can be instantiated to refer to a person.

Java Example[edit]

In the below code a set of parameters is being passed to the createPerson method.

public Person createPerson( final String lastname, final String firstname, final String middlename, final String streetAddress, final String city, final String state)
{
....
}

Instead of passing a long parameter list, two objects,one belonging to Name and other to the Address class have been created and passed.

public Person createPerson( final Name fullname, final Address fulladdress)
{
....
}

// And the 2 classes can be defined as

public final class Name{
final String lastname;
final String firstname;
final String middlename;
....
}

public final class Address{
final String streetAddress;
final String city;
final String state;
....
}

Preserve Whole Object[edit]

There are occasions when several values need to be extracted from an object to be later passed on as parameters to a function[4]. On such occasions the object itself can be passed on as a parameter.

Java Example[edit]

In the below code,we obtain the parameters highest and lowest from a temp object and pass it to a Range method.

int lowest = temp().getLowest();
int highest = temp().getHighest();
boolean range = plan.Range(lowest, highest);

Instead the entire object can be passed as a parameter to the Range method.

boolean range = plan.Range(temp());

Detection[edit]

For detecting code smells there exists Reek[5], a code smell detector that can analyze ruby files and extract code smells. In case of data clumps, it tries to identify a group of two or three data elements that are expected as parameters more than two methods of a class.[6] Below is an example of a warning thrown by reek on identification of a data clump.

class Test
  def x(a1,a2); end
  def y(a1,a2); end
  def z(a1,a2); end
end
test.rb -- 1 warning:
  [2, 3, 4]:Dummy takes parameters [a1, a2] to 3 methods (DataClump)

Advantages[edit]

  • Improved code understanding and organization.
  • Operations on the same set of data are gathered into a single place, instead of randomly being spread across the code.
  • Reduces code size considerably.
  • Huge code bases are easy to maintain.

Disadvantages[edit]

There aren't many disadvantages in removing Data Clumps from code as such when done carefully. Although it can be a daunting task to detect them in a legacy code[7] and ensuring its handling in some cases can lead to further bugs being introduced in the system[8]. While removing them the developer should do exhaustive testing in the form of Unit Test, Integration Test, Functional Test, End to End Test [9]etc., before releasing the changes.

References[edit]

  1. ^ "Martin Fowler". martinfowler.com. Retrieved 2016-02-08.
  2. ^ Fields, Jay; Beck, Kent; Fowler, Martin; Harvie, Shane (2009-10-13). Refactoring Ruby: Bad Smells in Code.
  3. ^ "Design Patterns and Refactoring". sourcemaking.com. Retrieved 2016-02-09.
  4. ^ "Design Patterns and Refactoring". sourcemaking.com. Retrieved 2016-02-08.
  5. ^ "troessner/reek". GitHub. Retrieved 2016-02-08.
  6. ^ "File: Data-Clump — Documentation for reek (3.10.0)". www.rubydoc.info. Retrieved 2016-02-09. {{cite web}}: line feed character in |title= at position 17 (help)
  7. ^ "Legacy code". Wikipedia, the free encyclopedia.
  8. ^ "Bad Smelling Concepts in Software Refactoring" (PDF).
  9. ^ "Types of Software testing and definitions of testing terms". www.softwaretestinghelp.com. Retrieved 2016-02-16.