Code bloat

From Wikipedia, the free encyclopedia

Jump to: navigation, search

Code bloat is the production of code that is perceived as unnecessarily long, slow, or otherwise wasteful of resources. Code bloat can be caused by inadequacies in the language in which the code is written, inadequacies in the compiler used to compile the language, or by a developer.

Often bloated code can be from a developer who simply uses more lines of code than the optimal solution to a problem. Some examples of developer code bloat are:

  • overuse of object oriented constructs -- the overuse of object oriented constructs such as classes and inheritance can lead to messy and confusing designs, often taking many more lines of code than an optimal solution.
  • incorrect usage of design patterns -- developers in object oriented languages will often attempt to "force" design patterns as solutions to problems that do not need design patterns.
  • overuse of methods/functions/procedures -- breaking an algorithm up into many methods is a way to allow developers to reuse these methods to solve other problems. However breaking up an algorithm into many tiny methods often adds code bloat and makes the code difficult if not impossible to read and debug.
  • declarative programming -- implementing a declarative programming style in an imperative or object oriented language often leads to code bloat.

Some naïve implementations of the template system employed in C++ are examples of inadequacies in the compiler used to compile the language. A naïve compiler implementing this feature can introduce versions of a templated function for every type it is used with. This in turns leads to compiled functions that may never be used, thus resulting in code bloat. More sophisticated compilers and linkers detect the superfluous copies and discard them, reducing the bloat. Thus template code can result in smaller binaries because a compiler is allowed to discard dead code.[1]

Some examples of code bloat produced by native compilers include:

  • dead code -- code which is executed but whose result is never used.
  • redundant calculations -- re-evaluating expressions that have already been calculated once. Such redundant calculations are often generated when implementing "bounds checking" code to prevent buffer overflow. Sophisticated compilers calculate such things exactly once, eliminating the following redundant calculations, using common subexpression elimination and loop-invariant code motion.

The difference in code density between various languages is so great that often less memory is needed to hold both a program written in a "compact" language (such as a domain-specific programming language, Microsoft P-Code, or threaded code), plus an interpreter for that compact language (written in native code), than to hold that program written directly in native code.

In many cases, when two programs implement the same functionality, the larger program will also run slower than the smaller program. There are a few cases where there is a space-time tradeoff -- in those cases, a larger program runs faster than a smaller program.

Some techniques for reducing code bloat include:

  • refactoring commonly-used code sequence into a subroutine, and calling that subroutine from several locations, rather than copy and pasting that sequence at each of those locations,
  • re-using subroutines that have already been written, rather than re-writing them again from scratch.


[edit] See also

[edit] References

  1. ^ hopl-may.dvi