Cyclomatic complexity

Cyclomatic complexity (or conditional complexity) is a software metric (measurement). It was developed by Thomas J. McCabe and is used to measure the complexity of a program. It directly measures the number of linearly independent paths through a program's source code.

The concept, although not the method, is somewhat similar to that of general text complexity measured by the Flesch-Kincaid Readability Test.

Cyclomatic complexity is computed using the control flow graph of the program: the nodes of the graph correspond to the commands of a program, and a directed edge connects two nodes if the second command might be executed immediately after the first command.

Definition

The cyclomatic complexity of a structured program^[1] is defined as:^[2]

M = E − N + 2P

where

M = cyclomatic complexity

E = the number of edges of the graph

N = the number of nodes of the graph

P = the number of connected components.

The cyclomatic number, also known as the first Betti number, is defined as:^[3]

M = E − N + P

The relation between these is that the cyclomatic complexity is the cyclomatic number of the graph obtained by connecting each exit to the corresponding entrance.

For a program with multiple exits (return statements), the definition is instead:^{[citation needed]}

M = E − N + P + R

where

R = the number of return statements (equivalently, terminal nodes).

This is again obtained as the cyclomatic number of the graph obtained by connecting each exit to the corresponding entrance: this replaces each terminal node with an edge. P can be alternatively thought of as the number of entry points.

"M" is alternatively defined to be one larger than the number of decision points (if/case-statements, while-statements, etc) in a module (function, procedure, chart node, etc.), or more generally a system.

Separate subroutines are treated as being independent, disconnected components of the program's control flow graph.

Alternative way

There is another simple way to determine the cyclomatic number. This is done by counting the number of closed loops in the flow graph, and incrementing the number by the number of exit points.

i.e.

M = Number of closed loops + Number of exit points

For a connected graph with a single exit point, this is simply the number of closed loops + 1.

Formal definition

Formally, cyclomatic complexity can be defined as a relative Betti number, the size of a relative homology group:

M:=b_{1}(G,t):=\operatorname {rank} \,H_{1}(G,t)

which is read as “the first homology of the graph G, relative to the terminal nodes t”. This is a technical way of saying “the number of linearly independent paths through the flow graph from an entry to an exit”, where:

“linearly independent” corresponds to homology, and means one does not double-count backtracking;
“paths” corresponds to first homology: a path is a 1-dimensional object;
“relative” means the path must begin and end at an entry or exit point.

This corresponds to the intuitive notion of cyclomatic complexity, and can be calculated as above.

Alternatively, one can compute this via absolute Betti number (absolute homology – not relative) by identifying (gluing together) all terminal nodes on a given component (or equivalently, draw paths connecting the exits to the entrance), in which case (calling the new, augmented graph ${\tilde {G}}$ , which is ), one obtains:

M=b_{1}({\tilde {G}})=\operatorname {rank} \,H_{1}({\tilde {G}})

This corresponds to the characterization of cyclomatic complexity as “number of loops plus number of components”.

Topological technicalities

The control flow graph of a program is not in general strongly connected. Adding paths looping back from exit points to entry points makes each component strongly connected, simplifying the analysis – this is done in McCabe.

Implications for Software Testing

M is an upper bound for the number of test cases that are necessary to achieve a complete branch coverage.
M is a lower bound for the number of possible paths through the control flow graph. Equivalently, a lower bound for the number of test cases to achieve path coverage.

For example, consider a program that consists of two sequential if-then-else statements.

if( c1 )
   f1();
else
   f2();

if( c2 )
   f3();
else
   f4();

To achieve a complete branch coverage, two test cases are sufficient here.
For a complete path coverage, four test cases are necessary.
The cyclomatic number M is three, falling in the range between these two values, as it does for any program – note that all these numbers can be equal:
branch coverage $\leq$ cyclomatic complexity $\leq$ path coverage

Etymology / Naming

The name Cyclomatic Complexity carries certain ambiguity, as this metric does not only count cycles (loops) in the program. It is motivated by the number of different cycles in the program control flow graph, after having added an imagined branch back from the exit node to the entry node, see [McCabe76], p. 309.

A better name for popular usage would be Conditional Complexity, as "it has been found to be more convenient to count conditions instead of predicates when calculating complexity" [McCabe76], p. 315.

Key Concept

The cyclomatic complexity of a section of source code is the count of the number of linearly independent paths through the source code. For instance, if the source code contained no decision points such as IF statements or FOR loops, the complexity would be 1, since there is only a single path through the code. If the code had a single IF statement containing a single condition there would be two paths through the code, one path where the IF statement is evaluated as TRUE and one path where the IF statement is evaluated as FALSE.

Cyclomatic complexity is normally calculated by creating a graph of the source code with each line of source code being a node on the graph and arrows between the nodes showing the execution pathways. As some programming languages can be quite terse and compact, a source code statement when developing the graph may actually create several nodes in the graph (for instance, when using the "?:" ternary conditional operator in C, C++, C# and Java).

In general, in order to fully test a module all execution paths through the module should be exercised. This implies a module with a high complexity number requires more testing effort than a module with a lower value since the higher complexity number indicates more pathways through the code. This also implies that a module with higher complexity is more difficult for a programmer to understand since the programmer must understand the different pathways and the results of those pathways.

One would also expect that a module with higher complexity would tend to have lower cohesion (less than functional cohesion) than a module with lower complexity. The possible correlation between higher complexity measure with a lower level of cohesion is predicated on a module with more decision points generally implementing more than a single well defined function. However there are certain types of modules that one would expect to have a high complexity number, such as user interface (UI) modules containing source code for data validation and error recovery.

Notes

Les Hatton claimed recently (Key note at TAIC-PART 2008, Windsor, UK, Sept 2008) that McCabe Cyclomatic Complexity has the same prediction ability as lines of code.

Software Testing Tools

The HDL Complexity Tool is a simple tool to provide measurement data. The goal of this tool is to generate straight-forward measurements of complexity for hardware projects such as complex SoCs.
Code Analyzer is a free Java tool to compute LoC and Cyclomatic Complexity for C++, Java and VB.
Python complexity - Journyx provides McCabe-like cyclomatic complexity analysis tools for analyzing Python code (which are written in Perl).
lint_php: PHP cyclomatic complexity - provides McCabe-like cyclomatic complexity free online analysis for PHP code (the tool is written in PHP, as well, and can be downloaded for offline use).
McCabe Software, Inc. - Founded by Cyclomatic Complexity inventor Thomas J. McCabe, McCabe Software, Inc. provides a suite of complexity analysis and test coverage tools with support for several programming languages.
Compuware's Xpediter/DevEnterprise - provides Cyclomatic Complexity for COBOL and PL/I programs among other metrics
Structure101 - Uses Cyclomatic Complexity as a basis for analyzing and rating the quality of your software architecture.
CyVis - A Free Software Cyclomatic Complexity visualiser tool for software built using Java.
C & C++ Code Counter - A Free Software for count various metric of code including Cyclomatic Complexity
Perl::Metrics::Simple - An Open Source library for static analysis of Perl code, providing a number of metrics including an approximation of Cyclomatic Complexity for each subroutine/method.
SourceMonitor Freeware
SciTools Understand Commercial code analysis tool for C/C++, C#, Java, Ada, Fortran, Delphi, Jovial, and PL/M.
Complexian Complexity (cyclomatic and npath) Analyser for Java code.
[1]Parasoft provide tools covering Java, C, C++ & .Net that will provide total quality management during the SDLC.
Project Analyzer Code analysis and metrics tool for Visual Basic. Computes 180 software metrics, including several versions of Cyclomatic Complexity.
Hammurapi Code analysis and metrics tool for Java.

External links

A Complexity Measure - McCabe's original paper (1976).
Article Some thoughts on Cyclomatic Complexity from From The Trench
NIST Special Publication 500-235 Structured Testing: A Testing Methodology Using the Cyclomatic Complexity Metric

^ Here "structured" means in particular "with a single exit (return statement) per function".
^ McCabe, p. 314
^ McCabe, p. 308

[1] Here "structured" means in particular "with a single exit (return statement) per function".

[2] McCabe, p. 314

[3] McCabe, p. 308

[1]

[2]

[3]