# Register allocation

In compiler optimization, register allocation is the process of assigning a large number of target program variables onto a small number of CPU registers. Register allocation can happen over a basic block (local register allocation), over a whole function/procedure (global register allocation), or across function boundaries traversed via call-graph (interprocedural register allocation). When done per function/procedure the calling convention may require insertion of save/restore around each call-site.

## Introduction

In many programming languages, the programmer may use any number of variables. The computer can quickly read and write registers in the CPU, so the computer program runs faster when more variables can be in the CPU's registers. Also, sometimes code accessing registers is more compact, so the code is smaller, and can be fetched faster if it uses registers rather than memory. However, the number of registers is limited in most CPUs. Therefore, when the compiler is translating code to machine-language, it must decide how to allocate variables to the limited number of registers in the CPU.

Not all variables are in use (or "live") at the same time, so over the lifetime of a program a given register may be used to hold different variables. However, two variables in use at the same time cannot be assigned to the same register without corrupting one of the variables. If there are not enough registers to hold all the variables, some variables may be moved to and from RAM. This process is called "spilling" the registers. Accessing RAM is significantly slower than accessing registers and so a compiled program runs slower. Therefore, an optimizing compiler aims to assign as many variables to registers as possible. A high "Register pressure" is a technical term that means that more spills and reloads are needed.

In addition, some computer designs cache frequently-accessed registers. So, programs can be further optimized by assigning the same register to a source and destination of a `move` instruction whenever possible. This is especially important if the compiler is using an intermediate representation such as static single-assignment form (SSA). In particular, when SSA is not fully optimized it can artificially generate additional `move` instructions.

## Spilling

In most register allocators, each variable is assigned to either a CPU register or to main memory. The advantage of using a register is speed. Computers have a limited number of registers, so not all variables can be assigned to registers. A "spilled variable" is a variable in main memory rather than in a CPU register. The operation of moving a variable from a register to memory is called spilling, while the reverse operation of moving a variable from memory to a register is called filling. For example, a 32-bit variable spilled to memory gets 32 bits of stack space allocated and all references to the variable are then to that memory. Such a variable has a much slower processing speed than a variable in a register. When deciding which variables to spill, multiple factors are considered: execution time, code space, data space.

## Isomorphism to graph colorability

Through liveness analysis, compilers can determine which sets of variables are live at the same time, as well as variables which are involved in `move` instructions. Using this information, the compiler can construct a data structure representing a graph such that every vertex represents a unique variable in the program. Interference edges connect pairs of vertices which are live at the same time, and preference edges connect pairs of vertices which are involved in move instructions.

K-coloring can then be reduced to the problem of register allocation, where K vertices map to K registers available on a target architecture. No two vertices sharing an interference edge may be assigned the same color, and vertices sharing a preference edge should be assigned the same color if possible. Some of the vertices may be pre-colored to begin with, representing variables which must be kept in certain registers due to some instructions returning results in specific registers or calling conventions between modules. As graph coloring in general is NP-hard, so is register allocation. However, good algorithms exist which balance performance with quality of compiled code.

It may be the case that the graph coloring algorithm fails to find a coloring of the interference graph. In this case, some of the variables must be spilled to memory in order to enable the remaining variables to be allocated to registers. This may be accomplished by a recursive search that tries spilling one variable and then recursively either colors the remaining set of variables or continues spilling recursively until all remaining non-spilled variables can be colored and assigned to registers.

## Iterated Register Coalescing

Register allocators have several types, with Iterated Register Coalescing[1] (IRC) being a more common one. IRC was invented by Lal George and Andrew Appel in 1996, building on earlier work by Gregory Chaitin. IRC works based on a few principles. First, if there are any non-move related vertices in the graph with degree less than K the graph can be simplified by removing those vertices, since once those vertices are added back in it is guaranteed that a color can be found for them (simplification). Second, two vertices sharing a preference edge whose adjacency sets combined have a degree less than K can be combined into a single vertex, by the same reasoning (coalescing). If neither of the two steps can simplify the graph, simplification can be run again on move-related vertices (freezing). Finally, if nothing else works, vertices can be marked for potential spilling and removed from the graph (spill). Since all of these steps reduce the degrees of vertices in the graph, vertices may transform from being high-degree (degree > K) to low-degree during the algorithm, enabling them to be simplified or coalesced. Thus, the stages of the algorithm are iterated to ensure aggressive simplification and coalescing. The pseudo-code is thus:

``` function IRC_color g K :
repeat
if ∃v s.t. ¬moveRelated(v) ∧ degree(v) < K then simplify v
else if ∃e s.t. cardinality(neighbors(first e) ∪ neighbors(second e)) < K then coalesce e
else if ∃v s.t. moveRelated(v) then deletePreferenceEdges v
else if ∃v s.t. ¬precolored(v) then spill v
else return
loop
```

The coalescing done in IRC is conservative, because aggressive coalescing may introduce spills into the graph. However, additional coalescing heuristics such as George coalescing may coalesce more vertices while still ensuring that no additional spills are added. Work-lists are used in the algorithm to ensure that each iteration of IRC requires sub-quadratic time.

## Recent developments

Graph coloring allocators produce efficient code, but their allocation time is high. In cases of static compilation, allocation time is not a significant concern. In cases of dynamic compilation, such as just-in-time (JIT) compilers, fast register allocation is important. A technique proposed by Poletto and Sarkar in 1997 is linear scan allocation.[2] This technique requires only a single pass over the list of variable live ranges. Ranges with short lifetimes are assigned to registers, whereas those with long lifetimes tend to be spilled, or reside in memory. The results are on average only 12% less efficient than graph coloring allocators.

The linear scan algorithm follows:

1. Perform dataflow analysis to gather liveness information. Keep track of all variables’ live intervals, the interval when a variable is live, in a list sorted in order of increasing start point (note that this ordering is free if the list is built when computing liveness.) We consider variables and their intervals to be interchangeable in this algorithm.
2. Iterate through liveness start points and allocate a register from the available register pool to each live variable.
3. At each step maintain a list of active intervals sorted by the end point of the live intervals. (Note that insertion sort into a balanced binary tree can be used to maintain this list at linear cost.) Remove any expired intervals from the active list and free the expired interval’s register to the available register pool.
4. In the case where the active list is size R we cannot allocate a register. In this case add the current interval to the active pool without allocating a register. Spill the interval from the active list with the furthest end point. Assign the register from the spilled interval to the current interval or, if the current interval is the one spilled, do not change register assignments.

Cooper and Dasgupta recently developed a "lossy" Chaitin-Briggs graph coloring algorithm suitable for use in a JIT.[3] The "lossy" moniker refers to the imprecision the algorithm introduces into the interference graph. This optimization reduces the costly graph building step of Chaitin-Briggs making it suitable for runtime compilation. Experiments indicate that this lossy register allocator outperforms linear scan on the majority of tests used.

"Optimal" register allocation algorithms based on Integer Programming have been developed by Goodwin and Wilken for regular architectures. These algorithms have been extended to irregular architectures by Kong and Wilken.

While the worst case execution time is exponential, the experimental results show that the actual time is typically of order ${\displaystyle O(n^{2.5})}$ of the number of constraints ${\displaystyle n}$.[4]

The possibility of doing register allocation on SSA-form programs is a focus of recent research.[5] The interference graphs of SSA-form programs are chordal, and as such, they can be colored in polynomial time. To clarify the sources of NP-completeness, recent research has examined register allocation in a broader context.[6][7]