Cyclomatic complexity

From Wikipedia, the free encyclopedia
Jump to: navigation, search

Cyclomatic complexity is a software metric (measurement). It was developed by Thomas J. McCabe, Sr. in 1976 and is used to indicate the complexity of a program. It is a quantitative measure of the complexity of programming instructions. It directly measures the number of linearly independent paths through a program's source code. The concept, although not the method, is somewhat similar to that of general text complexity measured by the Flesch-Kincaid Readability Test.

Cyclomatic complexity is computed using the control flow graph of the program: the nodes of the graph correspond to indivisible groups of commands of a program, and a directed edge connects two nodes if the second command might be executed immediately after the first command. Cyclomatic complexity may also be applied to individual functions, modules, methods or classes within a program.

One testing strategy, called Basis path testing by McCabe who first proposed it, is to test each linearly independent path through the program; in this case, the number of test cases will equal the cyclomatic complexity of the program.[1]

Description[edit]

Definition[edit]

A control flow graph of a simple program. The program begins executing at the red node, then enters a loop (group of three nodes immediately below the red node). On exiting the loop, there is a conditional statement (group below the loop), and finally the program exits at the blue node. This graph has 9 edges, 8 nodes, and 1 connected component, so the cyclomatic complexity of the program is 9 - 8 + (2*1) = 3.

The cyclomatic complexity of a section of source code is the count of the number of linearly independent paths through the source code. For instance, if the source code contained no decision points such as IF statements or FOR loops, the complexity would be 1, since there is only a single path through the code. If the code had a single IF statement containing a single condition, there would be two paths through the code: one path where the IF statement is evaluated as TRUE and one path where the IF statement is evaluated as FALSE.

Mathematically, the cyclomatic complexity of a structured program[a] is defined with reference to the control flow graph of the program, a directed graph containing the basic blocks of the program, with an edge between two basic blocks if control may pass from the first to the second. The complexity M is then defined as[2]

M = EN + 2P,

where

E = the number of edges of the graph.
N = the number of nodes of the graph.
P = the number of connected components.
The same function as above, shown as a strongly connected control flow graph, for calculation via the alternative method. This graph has 10 edges, 8 nodes, and 1 connected component, so the cyclomatic complexity of the program is 10 - 8 + 1 = 3.

An alternative formulation is to use a graph in which each exit point is connected back to the entry point. In this case, the graph is strongly connected, and the cyclomatic complexity of the program is equal to the cyclomatic number of its graph (also known as the first Betti number), which is defined as[2]

M = EN + P.

This may be seen as calculating the number of linearly independent cycles that exist in the graph, i.e. those cycles that do not contain other cycles within themselves. Note that because each exit point loops back to the entry point, there is at least one such cycle for each exit point.

For a single program (or subroutine or method), P is always equal to 1. So a simpler formula for a single subroutine is

M = EN + 2.[3]

Cyclomatic complexity may, however, be applied to several such programs or subprograms at the same time (e.g., to all of the methods in a class), and in these cases P will be equal to the number of programs in question, as each subprogram will appear as a disconnected subset of the graph.

McCabe showed that the cyclomatic complexity of any structured program with only one entrance point and one exit point is equal to the number of decision points (i.e., "if" statements or conditional loops) contained in that program plus one. However, this is true only for decision points counted at the lowest, machine-level instructions. Decisions involving compound predicates like those found in high-level languages like IF cond1 AND cond2 THEN ... should be counted in terms of predicate variables involved, i.e. in this examples one should count two decision points, because at machine level it is equivalent to IF cond1 THEN IF cond2 THEN ....[2][4]

Cyclomatic complexity may be extended to a program with multiple exit points; in this case it is equal to:

π − s + 2,

where π is the number of decision points in the program, and s is the number of exit points.[4][5]

Explanation in terms of algebraic topology[edit]

An even subgraph of a graph (also known as an Eulerian subgraph) is one where every vertex is incident with an even number of edges; such subgraphs are unions of cycles and isolated vertices. In the following, even subgraphs will be identified with their edge sets, which is equivalent to only considering those even subgraphs which contain all vertices of the full graph.

The set of all even subgraphs of a graph is closed under symmetric difference, and may thus be viewed as a vector space over GF(2); this vector space is called the cycle space of the graph. The cyclomatic number of the graph is defined as the dimension of this space. Since GF(2) has two elements and the cycle space is necessarily finite, the cyclomatic number is also equal to the 2-logarithm of the number of elements in the cycle space.

A basis for the cycle space is easily constructed by first fixing a maximal spanning forest of the graph, and then considering the cycles formed by one edge not in the forest and the path in the forest connecting the endpoints of that edge; these cycles constitute a basis for the cycle space. Hence, the cyclomatic number also equals the number of edges not in a maximal spanning forest of a graph. Since the number of edges in a maximal spanning forest of a graph is equal to the number of vertices minus the number of components, the formula E-N+P above for the cyclomatic number follows.[6]

For the more topologically inclined, cyclomatic complexity can alternatively be defined as a relative Betti number, the size of a relative homology group:

M := b_1(G,t) := \operatorname{rank}H_1(G,t),

which is read as "the first homology of the graph G, relative to the terminal nodes t". This is a technical way of saying "the number of linearly independent paths through the flow graph from an entry to an exit", where:

  • "linearly independent" corresponds to homology, and means one does not double-count backtracking;
  • "paths" corresponds to first homology: a path is a 1-dimensional object;
  • "relative" means the path must begin and end at an entry or exit point.

This corresponds to the intuitive notion of cyclomatic complexity, and can be calculated as above.

Alternatively, one can compute this via absolute Betti number (absolute homology – not relative) by identifying (gluing together) all the terminal nodes on a given component (or equivalently, draw paths connecting the exits to the entrance), in which case (calling the new, augmented graph \tilde G, which is[clarification needed]), one obtains:

M = b_1(\tilde G) = \operatorname{rank}H_1(\tilde G).

It can also be computed via homotopy. If one considers the control flow graph as a 1-dimensional CW complex called X, then the fundamental group of X will be \pi_1(X) = \Z^n. The value of n+1 is the cyclomatic complexity. The fundamental group counts how many loops there are through the graph, up to homotopy, and hence aligns with what we would intuitively expect.

This corresponds to the characterization of cyclomatic complexity as "number of loops plus number of components".

Applications[edit]

Limiting complexity during development[edit]

One of McCabe's original applications was to limit the complexity of routines during program development; he recommended that programmers should count the complexity of the modules they are developing, and split them into smaller modules whenever the cyclomatic complexity of the module exceeded 10.[2] This practice was adopted by the NIST Structured Testing methodology, with an observation that since McCabe's original publication, the figure of 10 had received substantial corroborating evidence, but that in some circumstances it may be appropriate to relax the restriction and permit modules with a complexity as high as 15. As the methodology acknowledged that there were occasional reasons for going beyond the agreed-upon limit, it phrased its recommendation as: "For each module, either limit cyclomatic complexity to [the agreed-upon limit] or provide a written explanation of why the limit was exceeded."[7]

Measuring the "structuredness" of a program[edit]

Main article: essential complexity

Section VI of McCabe's 1976 paper is concerned with determining what the control flow graphs (CFGs) of non-structured programs look like in terms of their subgraphs, which McCabe identifies. (For details on that part see structured program theorem.) McCabe concludes that section by proposing a numerical measure of how close to the structured programming ideal a given program is, i.e. its "structuredness" using McCabe's neologism. McCabe called the measure he devised for this purpose essential complexity.[2]

In order to calculate this measure, the original CFG is iteratively reduced by identifying subgraphs that have a single-entry and a single-exit point, which are then replaced by a single node. This reduction corresponds to what a human would do if he extracted a subroutine from the larger piece of code. (Nowadays such a process would fall under the umbrella term of refactoring.) McCabe's reduction method was later called condensation in some textbooks, because it was seen as a generalization of the condensation to components used in graph theory.[8] If a program is structured, then McCabe's reduction/condensation process reduces it to a single CFG node. In contrast, if the program is not structured, the iterative process will identify the irreducible part. The essential complexity measure defined by McCabe is simply the cyclomatic complexity of this irreducible graph, so it will be precisely 1 for all structured programs, but greater than one for non-structured programs.[7]:80

Implications for software testing[edit]

Another application of cyclomatic complexity is in determining the number of test cases that are necessary to achieve thorough test coverage of a particular module.

It is useful because of two properties of the cyclomatic complexity, M, for a specific module:

  • M is an upper bound for the number of test cases that are necessary to achieve a complete branch coverage.
  • M is a lower bound for the number of paths through the control flow graph (CFG). Assuming each test case takes one path, the number of cases needed to achieve path coverage is equal to the number of paths that can actually be taken. But some paths may be impossible, so although the number of paths through the CFG is clearly an upper bound on the number of test cases needed for path coverage, this latter number (of possible paths) is sometimes less than M.

All three of the above numbers may be equal: branch coverage \leq cyclomatic complexity \leq number of paths.

For example, consider a program that consists of two sequential if-then-else statements.

if( c1() )
   f1();
else
   f2();
 
if( c2() )
   f3();
else
   f4();
The control flow graph of the source code above; the red circle is the entry point of the function, and the blue circle is the exit point. The exit has been connected to the entry to make the graph strongly connected.

In this example, two test cases are sufficient to achieve a complete branch coverage, while four are necessary for complete path coverage. The cyclomatic complexity of the program is 3 (as the strongly connected graph for the program contains 9 edges, 7 nodes and 1 connected component) (9-7+1).

In general, in order to fully test a module all execution paths through the module should be exercised. This implies a module with a high complexity number requires more testing effort than a module with a lower value since the higher complexity number indicates more pathways through the code. This also implies that a module with higher complexity is more difficult for a programmer to understand since the programmer must understand the different pathways and the results of those pathways.

Unfortunately, it is not always practical to test all possible paths through a program. Considering the example above, each time an additional if-then-else statement is added, the number of possible paths doubles. As the program grew in this fashion, it would quickly reach the point where testing all of the paths was impractical.

One common testing strategy, espoused for example by the NIST Structured Testing methodology, is to use the cyclomatic complexity of a module to determine the number of white-box tests that are required to obtain sufficient coverage of the module. In almost all cases, according to such a methodology, a module should have at least as many tests as its cyclomatic complexity; in most cases, this number of tests is adequate to exercise all the relevant paths of the function.[7]

As an example of a function that requires more than simply branch coverage to test accurately, consider again the above function, but assume that to avoid a bug occurring, any code that calls either f1() or f3() must also call the other.[b] Assuming that the results of c1() and c2() are independent, that means that the function as presented above contains a bug. Branch coverage would allow us to test the method with just two tests, and one possible set of tests would be to test the following cases:

  • c1() returns true and c2() returns true
  • c1() returns false and c2() returns false

Neither of these cases exposes the bug. If, however, we use cyclomatic complexity to indicate the number of tests we require, the number increases to 3. We must therefore test one of the following paths:

  • c1() returns true and c2() returns false
  • c1() returns false and c2() returns true

Either of these tests will expose the bug.

Cohesion[edit]

One would also expect that a module with higher complexity would tend to have lower cohesion (less than functional cohesion) than a module with lower complexity. The possible correlation between higher complexity measure with a lower level of cohesion is predicated on a module with more decision points generally implementing more than a single well defined function. A 2005 study showed stronger correlations between complexity metrics and an expert assessment of cohesion in the classes studied than the correlation between the expert's assessment and metrics designed to calculate cohesion.[9]

Correlation to number of defects[edit]

A number of studies have investigated cyclomatic complexity's correlation to the number of defects contained in a function or method.[citation needed] Some[citation needed] studies find a positive correlation between cyclomatic complexity and defects: functions and methods that have the highest complexity tend to also contain the most defects, however the correlation between cyclomatic complexity and program size has been demonstrated many times and since program size is not a controllable feature of commercial software the usefulness of McCabes's number has been called to question. The essence of this observation is that larger programs (more complex programs as defined by McCabe's metric) tend to have more defects. Although this relation is provably true, it isn't commercially useful[10] . As a result the metric has not been accepted by commercial software development organizations.

Studies that controlled for program size (i.e., comparing modules that have different complexities but similar size, typically measured in lines of code) are generally less conclusive, with many finding no significant correlation, while others do find correlation. Some researchers who have studied the area question the validity of the methods used by the studies finding no correlation.[11]

Les Hatton claimed (Keynote at TAIC-PART 2008, Windsor, UK, Sept 2008) that McCabe's Cyclomatic Complexity number has the same predictive ability as lines of code.[12]

See also[edit]

Notes[edit]

  1. ^ Here "structured" means in particular "with a single exit (return statement) per function".
  2. ^ This is a fairly common type of condition; consider the possibility that f1 allocates some resource which f3 releases.

References[edit]

  1. ^ A J Sojev. "Basis Path Testing". 
  2. ^ a b c d e McCabe (December 1976). "A Complexity Measure". IEEE Transactions on Software Engineering: 308–320. 
  3. ^ Philip A. Laplante (2007). What Every Engineer Should Know about Software Engineering. CRC Press. p. 176. ISBN 978-1-4200-0674-2. 
  4. ^ a b Belzer, Kent, Holzman and Williams (1992). Encyclopedia of Computer Science and Technology. CRC Press. pp. 367–368. 
  5. ^ Harrison (October 1984). "Applying Mccabe's complexity measure to multiple-exit programs". Software: Practice and Experience (J Wiley & Sons). 
  6. ^ Diestel, Reinhard (2000). Graph theory. Graduate texts in mathematics 173 (2 ed.). New York: Springer. ISBN 0-387-98976-5. 
  7. ^ a b c Arthur H. Watson and Thomas J. McCabe (1996). "Structured Testing: A Testing Methodology Using the Cyclomatic Complexity Metric". NIST Special Publication 500-235. 
  8. ^ Paul C. Jorgensen (2002). Software Testing: A Craftsman's Approach, Second Edition (2nd ed.). CRC Press. pp. 150–153. ISBN 978-0-8493-0809-3. 
  9. ^ Stein et al; Cox, Glenn; Etzkorn, Letha (2005). "Exploring the relationship between cohesion and complexity". Journal of Computer Science 1 (2): 137–144. doi:10.3844/jcssp.2005.137.144. 
  10. ^ G.S. Cherf (1992). "An Investigation of the Maintenance and Support Characteristics of Commercial Software". Journal of Software Quality (Springer-Verlag) 1 (3): 147–158. ISSN 1573-1367. 
  11. ^ Kan (2003). Metrics and Models in Software Quality Engineering. Addison-Wesley. pp. 316–317. ISBN 0-201-72915-6. 
  12. ^ Les Hatton (2008). "The role of empiricism in improving the reliability of future software". version 1.1. 

External links[edit]