Live-variable analysis

In compilers, live variable analysis (or simply liveness analysis) is a classic data-flow analysis to calculate the variables that are live at each point in the program. A variable is live at some point if it holds a value that may be needed in the future, or equivalently if its value may be read before the next time the variable is written to.

Example

Consider the following program:

b = 3
c = 5
a = f(b * c)

The set of live variables between lines 2 and 3 is {b, c} because both are used in the multiplication on line 3. But the set of live variables after line 1 is only {b}, since variable c is updated later, on line 2. The value of variable a is not used in this code.

Note that the assignment to a may be eliminated as a is not used later, but there is insufficient information to justify removing all of line 3 as f may have side effects (printing b * c, perhaps).

Expression in terms of dataflow equations

Liveness analysis is a "backwards may" analysis. The analysis is done in a backwards order, and the dataflow confluence operator is set union. In other words, if applying liveness analysis to a function with a particular number of logical branches within it, the analysis is performed starting from the end of the function working towards the beginning (hence "backwards"), and a variable is considered live if any of the branches moving forward within the function might potentially (hence "may") need the variable's current value. This is in contrast to a "backwards must" analysis which would instead enforce this condition on all branches moving forward.

The dataflow equations used for a given basic block s and exiting block f in live variable analysis are the following:

{\mbox{GEN}}[s]

: The set of variables that are used in s before any assignment in the same basic block.

{\mbox{KILL}}[s]

: The set of variables that are assigned a value in s (in many books that discuss compiler design, KILL (s) is also defined as the set of variables assigned a value in s before any use, but this does not change the solution of the dataflow equation):

{\mbox{LIVE}}_{in}[s]={\mbox{GEN}}[s]\cup ({\mbox{LIVE}}_{out}[s]-{\mbox{KILL}}[s])

{\mbox{LIVE}}_{out}[final]={\emptyset }

{\mbox{LIVE}}_{out}[s]=\bigcup _{p\in succ[s]}{\mbox{LIVE}}_{in}[p]

{\mbox{GEN}}[d:y\leftarrow f(x_{1},\cdots ,x_{n})]=\{x_{1},...,x_{n}\}

{\mbox{KILL}}[d:y\leftarrow f(x_{1},\cdots ,x_{n})]=\{y\}

The in-state of a block is the set of variables that are live at the start of the block. Its out-state is the set of variables that are live at the end of it. The out-state is the union of the in-states of the block's successors. The transfer function of a statement is applied by making the variables that are written dead, then making the variables that are read live.

Second example

// in: {}; predecessor blocks: none
b1: a = 3; 
    b = 5;
    d = 4;
    x = 100; //x is never being used later thus not in the out set {a,b,d}
    if a > b then
// out: {a,b,d}    //union of all (in) successors of b1 => b2: {a,b}, and b3:{b,d}  

// in: {a,b}; predecessor blocks: b1
b2: c = a + b;
    d = 2;
// out: {b,d}

// in: {b,d}; predecessor blocks: b1 and b2
b3: endif
    c = 4;
    return b * d + c;
// out:{}

The in-state of b3 only contains b and d, since c has been written. The out-state of b1 is the union of the in-states of b2 and b3. The definition of c in b2 can be removed, since c is not live immediately after the statement.

Solving the data flow equations starts with initializing all in-states and out-states to the empty set. The work list is initialized by inserting the exit point (b3) in the work list (typical for backward flow). Its computed in-state differs from the previous one, so its predecessors b1 and b2 are inserted and the process continues. The progress is summarized in the table below.

processing	out-state	old in-state	new in-state	work list
b3	{}	{}	{b,d}	(b1,b2)
b1	{b,d}	{}	{}	(b2)
b2	{b,d}	{}	{a,b}	(b1)
b1	{a,b,d}	{}	{}	()

Note that b1 was entered in the list before b2, which forced processing b1 twice (b1 was re-entered as predecessor of b2). Inserting b2 before b1 would have allowed earlier completion.

Initializing with the empty set is an optimistic initialization: all variables start out as dead. Note that the out-states cannot shrink from one iteration to the next, although the out-state can be smaller than the in-state. This can be seen from the fact that after the first iteration the out-state can only change by a change of the in-state. Since the in-state starts as the empty set, it can only grow in further iterations.

References

Aho, Alfred; Lam, Monica; Sethi, Ravi; Ullman, Jeffrey (2007). Compilers: Principles, Techniques, and Tools (2 ed.). p. 608.

v t e Compiler optimizations
Basic block	Peephole optimization Local value numbering
Loop	Automatic parallelization Automatic vectorization Induction variable Loop fusion Loop-invariant code motion Loop inversion Loop interchange Loop nest optimization Loop splitting Loop unrolling Loop unswitching Software pipelining Strength reduction
Data-flow analysis	Available expression Common subexpression elimination Constant folding Dead store elimination Induction variable recognition and elimination Live-variable analysis Use-define chain
SSA-based	Global value numbering Sparse conditional constant propagation
Code generation	Instruction scheduling Instruction selection Register allocation Rematerialization
Functional	Deforestation Tail-call elimination
Global	Interprocedural optimization
Other	Bounds-checking elimination Compile-time function execution Dead-code elimination Expression templates Inline expansion Jump threading Partial evaluation Profile-guided optimization
Static analysis	Alias analysis Array-access analysis Control-flow analysis Data-flow analysis Dependence analysis Escape analysis Pointer analysis Shape analysis Value range analysis