The Chords The Colors The Registers The SSA Optimality

Introduction
In the compiler world, one of the hardest problems to solve is register allocation, and it's no coincidence! But what is register allocation? Register allocation is the problem of assigning โ variables, temporaries, etc... to a limited ๐พ number of physical registers.
In 1970, W.H.E. Day from IBM published their paper Compiler assignment of data items to registers, which was the paper that identified register allocation as a value assignment problem and solved it as an ILP (Integer Linear Programming) problem, which is an NP-Complete problem. Not long after, in 1971, IBM researcher John Cocke (the father of RISC) found that register allocation can be modeled as a graph coloring problem (an NP-Complete problem), but a real-world or practical solution was not found yet. Then in 1981, another IBM researcher, Gregory Chaitin, and his colleagues (John Cocke, Martin Hopkins & Marc Auslander) released one of the most influential papers in CS history: Register allocation and spilling via graph coloring, giving us the Chaitin register allocator (a.k.a. the Chaitin algorithm), which introduced the quadratic register allocator with linear graph coloring/assignment heuristic that suits real-world compilers.
The Problem
As you can see from the introduction, register allocation is an NP-Complete problem, making it hard, and the result is only as good as the heuristic used. In some modern real-world compilers, a variant of the Chaitin algorithm is used, called the Chaitin-Briggs algorithm. The algorithm yields better code than the original algorithm, and one such example is GCC IRA (integrated register allocator), which uses an algorithm based on Chaitin-Briggs for register allocation. Anyway, Chaitin-Briggs algorithm is still solving an NP-Complete problem, with the only difference being that total heuristic efficiency is significantly better.
There are a number of other algorithms used for register allocation other than graph coloring which are NOT NP-Complete, like linear-scan, which is primarily used in JIT compilers (CLR, V8, JVM, etc.) due to how fast it is. Unlike general graph coloring, linear scan operates over a graph called an interval graph, more specifically a linear interval graph, which you can think of as a single-dimension or linear graph where variables are represented as lines/intervals. Theoretically, coloring an pre-ordered interval graph can be done in O(N), but generated code in most cases is worse than that of coloring-based allocators like the Chaitin algorithm, meaning that it spills to memory more often.
As we know, solving an NP-Complete problem optimally is widely believed to be impossible in polynomial time. Therefore, an optimal algorithm typically requires exponential time **(**e.g. ๐(2^๐) ) to find an optimal solution, as itโs the case with graph coloring register allocators, which is infeasible for real-world compilers. Now the problem originates from the fact that most graph coloring solvers treat the interference graph as a general graph! But in the real world, most compilers represent programs in SSA form. You may be asking, "how does representing a program in SSA form make a difference?" Actually, in 2006, a computer scientist called Sebastien Hack (the founder of libFirm compiler backend) and his colleagues released a revolutionary paper Optimal register allocation for SSA-form programs in polynomial time, which proves that an interference graph built from a strict SSA program is a chordal graph and can be colored optimally in polynomial time with complexity of ๐(๐+๐).
Interference Graph
In compilers, an interference graph is an undirected graph that represents variables that are alive at the same time, where each vertex represents a variable and each edge represents a mutual interval point between variables (i.e., both variables are alive at the same time).
Building the interference graph is primarily done using liveness analysis on the whole graph, but there are smarter, more efficient ways to do so when the program is represented in SSA form (Sparse Liveness & Dominators).
Before comparing SSA chordal interference graphs to general arbitrary interference graphs, how about a code sample to work on for better visualization and understanding of the problem:
void example() {
int a = 0;
int b = 0;
int c = 0;
int d = 0;
while (...) {
// the initial value of 'a' is used for the last time here. 'c' is born.
// They do NOT overlap
c = a + 1;
d = b + 4;
// a is rewritten since it's mutable
// last use of 'c' here so there is NO interference between 'a' and 'c'
a = c + 5;
b = d + 2;
}
}
General
Here we'll talk about a program that is NOT in SSA form, so the original program above will remain as-is.

Figure 1: Non-chordal interference graph
[a]interferes with[b]and[d]:
If you look at the loop in the code, based on liveness analysis, both definitions ofaandbfroma = c + 5andb = d + 2are alive at the next loop iteration, so bothaandbare alive at the start of the loop and accordingly we say that they interfere.Also notice that at one point in the program's life,
afroma = c + 5is alive at the same time asdfromd = b + 4, soaanddare also alive at the same time and they interfere. Notice howaandcdo not interfere with each other because they are never alive at the same timeโwheneverais used,cis also defined, so there is no point in the program where both variables are used at the same time by any instruction or instructions.[c]interferes with[b]and[d]:
Same logic here. Because the definition ofbfromb = d + 2is alive/used after wherecis defined (atc = a + 1, exactly atd = b + 4), thuscinterferes withb. Also,cfrom the same pointc = a + 1is alive at the same time asdfromd = b + 4, sincec's new definition was used later ata = c + 5;andd's new definition was used later atb = d + 2. There was a point where two variables were alive at the same time, so bothcanddinterfere.

Figure 2: Sequence diagram representing interval timeline of interference graph
A careful reader would notice something crucial from the analysis above: for interference graph computation, all we care about is which definition of a variable reaches which point, which is a property that can be deduced sparsely and easily when the program is represented in SSA form due to one of its core properties (single reaching definition), which I am going to demonstrate next.
SSA-Chordal
I am going to use LLVM IR and LLVM optimizer to get the SSA representation of the previous C++ program.
The output of running opt -passes=mem2reg:
define dso_local void @example() {
entry:
br label %while.cond
while.cond:
%b.0 = phi i32 [ 0, %entry ], [ %b.1, %while.body ]
%a.0 = phi i32 [ 0, %entry ], [ %a.1, %while.body ]
%cmp = icmp slt i32 ......
br i1 %cmp, label %while.body, label %while.end
while.body:
%c.1 = add nsw i32 %a.0, 1
%d.1 = add nsw i32 %b.0, 4
%a.1 = add nsw i32 %c.1, 5
%b.1 = add nsw i32 %d.1, 2
br label %while.cond
while.end:
ret void
}
Before heading to explaining what is going on with the interference graph of the SSA code, let me first explain what a chordal graph is!
A chordal graph, also known as triangulated, is an undirected graph where any subgraph that forms cycles of 4 vertices or more has a chord. In other words, only cycles of three vertices (triangles) can exist! The chord in the graph is what connects non-adjacent nodes, forming a triangle.
The graph below is a chordal graph! Notice that there are no induced cycles longer than three (i.e., โฅ 4). The chords here are the edges between n5 and n4, n5 and n2.
Induced cycles are cycles where no two non-adjacent vertices have an edge between them (imagine an empty square as an example).
Notice how the graph looks like triangles connected to each other.

Figure 3: Arbitrary Chordal Graph
An interference graph built from a strict SSA program is always a chordal graph. If you look at the interference graph of the SSA code built from the C++ sample code above, you can see how the interference graph here is chordal since it's linear and doesn't have any cycles of 4 or more.

Figure 4: Interference graph of SSA code
The interference graph here is simple and explains itself, but one thing I want to clarify about Phi nodes/instructions to avoid any confusion! In the SSA code, there is the loop header which naturally contains phi nodes since data is flowing from code falling through to the loop header and from the loop latch (the back edge):
while.cond:
%b.0 = phi i32 [ 0, %entry ], [ %b.1, %while.body ]
%a.0 = phi i32 [ 0, %entry ], [ %a.1, %while.body ]
Some readers might ask: "The definition of %b.1 is alive at the same time as %b.0 (since one flows into the other), so shouldn't they interfere?"
This is where the magic of SSA Dominance comes in.
In SSA form, a value created by a Phi node represents a definition that dominates future usages until it hits its dominance frontier. In our example, %b.0 dominates the loop body. This means whenever b is used inside the loop (before re-assignment), the compiler uses %b.0 because it is the single reachable definition.
When b is reassigned in the loop (%b.1 = ...), this creates a totally new definition. From that point on, %b.1 becomes the dominating definition.
However, at the loop header, we have a merge point. We have two reaching definitions: 0 (from entry) and %b.1 (from the back-edge). The Phi node inserts a new definition, %b.0, to unify them. Because %b.0 is the new dominator for the loop, it effectively supersedes %b.1 at the loop entry. The live range of %b.1 effectively ends where %b.0 begins, preventing the kind of overlapping "interference cycles" we see in non-SSA graphs.

Figure 5: Sequence diagram representing interval timeline of SSA interference graph
Register Allocation And Graph Coloring
As mentioned in the introduction, register allocation is the problem of assigning variables or expressions โ to a limited number of registers ๐พ.
Note: from now on I am going to use RA as shortcut to register allocation
The definition above is the formal definition of the problem, but many compiler developers like to see or define it as "the problem of reducing spilling data to memory" since advanced register allocators are not just about assigning N to K but rather about optimizing or minimizing the cost of spilling and keeping hot data in registers with number of other parameters to consider
And what about graph coloring?
Graph coloring is the problem of assigning โ colors to ๐พ vertices in graph ๐บ where no two or more connected vertices have the same color โ.
If we map graph coloring to RA, you can see how RA just fits the problem:
Colors are equivalent to registers
Vertices are equal to variables and/or expressions
Vertices connected to each other cannot have the same colorโi.e., they are alive at the same time and cannot be assigned to the same register
Canonical Stages of RA
RA operates over the interference graph and, as mentioned above in the quote, real-world register allocators don't just assign variables to registersโthey go through multiple stages (either independently or iteratively) to achieve the best possible practical result (optimality is constrained by NP-Completeness for non-SSA/Chordal graphs).
The following stages are common to most RA architectures, regardless of whether they use graph coloring:
Build Interference Graph: The allocator will build the interference graph based on liveness analysis
Live Range Splitting: This one specifically is not really mandatory but rather a critical optimization for generating the best possible code. The allocator will split the live interval of a single variable into multiples, which gives the allocator the chance to allocate different registers to the same variable at multiple points in the program. For example, a variable called
Vcan be split intov1andv2at different points of the program, allowing the allocator to assign different registers to each version.Coalescing: The compiler will remove unnecessary register copies like
mov R1,R2or similar. For example, the compiler can check that a copy fromR2toR1is redundant sinceR2holds the same value asR1throughout the program's life, and accordingly we can removemov R1,R2and replace all usages ofR2withR1.
There is multiple coalescing strategies like (Aggressive, Conservative, Iterative, Optimistic, Biased Coloring) which decides what/when/how to do the coalescing some like Biased Coloring donโt merge nodes evenNote: In order for Coalescing to work with the previous example,
R2must not interfere withR1โi.e.,R2must not be alive at the same time asR1, because if that's the case, then the definition of both registers is used by other instructions.Spilling: is the act of loading and storing from/to memory instead of registers. This happens when the number of simultaneously alive variables is greater than the number of available registers.
Assignment: is when RA assigns register ๐พ to variable โ
Mind you that some compilers like LLVM have more optimizations involved, like Dead Code Elimination and Rematerialization.
One register allocator that LLVM has is a custom priority queue based register allocator (Greedy Allocator) that operates over OUT-OF SSA MIR
Iterative RA
In such register allocators, the steps or stages above are part of a single pass, since in iterative RA the allocator may rebuild the graph after spilling and restart all over again, since a spilled node/variable is no longer colorable and accordingly we have a new interference graph.
One example of such RA is Chaitin-Briggs.
Pipelined RA
Formally, in academia there is no such classification as "pipelined RA," but the distinction is legitimate. Such register allocators execute the steps above one after another in a pipelined or standalone fashion. Such allocators don't backtrack like iterative RAโonce a step is executed, it's doneโbut one step may iteratively and without backtracking try to preprocess the graph, like the Hack SSA allocator fixing Local Pressure Spike in the program. Allocators of such design or style operate over SSA programs, like the Hack SSA allocator.
Chaitin-Briggs Allocator
The de facto graph coloring register allocator. The allocator can work against arbitrary graphs (chordal or not) and has a heuristic of ๐(๐ยฒ), but in practice the allocator rarely hits the worst case. And as I have mentioned already, the allocator solves an NP-Complete problem even if the graph is chordal, since it's not aware of its propertiesโagain, the allocator is a general allocator.
I will go through the steps of the allocator and show an example of how it works. In this example, we will be dealing with a CPU with two registers {R1,R2}:
int x = 1;
int y = 2;
int t = 0;
while(true) {
t = x + y; // t uses x and y
x = y; // x is updated
y = t; // y is updated
}
Renumbering: A.k.a. liveness analysisโit will analyze and collect live intervals of variables and temporaries.
Build: Using liveness analysis info, the allocator will build the interference graph. In our example, all variables/nodes are connected to each other, forming a triangle.

Coalesce (Optimization): If there are any unnecessary copies like (
x=y), nodesxandyin the interference graph get merged and the copy instruction will be eliminated. The allocator will do so to help optimize code performance by removing redundant or unnecessary copies and reducing register pressure.
In our example, there are no such copies.Simplify: This step builds the coloring stack.
First it checks all nodes with degrees < K where K is the number of registers or colors. If a node's degree is less than K, then it will get pushed to the stack, which means that it's colorable because such a variable/node interferes with fewer variables than available registers. When a node gets pushed to the stack, it also gets removed from the graph, and accordingly the degree of nodes connected to that node will get reduced. The simplify step repeats this until the graph is empty and the stack is full.In our example, we have two registers
{R1,R2}so K = 2, and all variables/nodes (x,y,t) have degree of 2, which means K == node degree (2 == 2) for all nodes, so no node will get pushed to the stack and the graph remains as is.Spill: In Chaitin-Briggs, spilling is optimistic, which means when a candidate for spilling is found, we push it to the coloring stack instead of spilling it directly, in hope that when coloring happens later there will be an available color/register to assign to such variable. Candidate priority is determined by the following formula:
$$\text{Priority} = \frac{\text{Spill Cost}}{\text{Degree}}$$
where spill cost depends on the depth of the loop that the variable is being used in, and candidates with lower priority will get spilled first.
In our example, all nodes are theoretically equal, so I am going to choose
t. Because of that, variabletwill get removed from the graph and pushed to the stack, and variablesxandywill have degree 1 sincetis no longer connected toxandy. Now the spiller has variables with degree less than K, so degree < K (1 < 2). Let's assumexwill get pushed to the stack as being colorable, and now we are left withywith degree 0, so (0 < 2), and it will also get pushed to the stack. Now the graph is empty and the stack is full.The stack (Top-Bottom) looks like: [y, x, t]
Select: A.k.a. coloring. Here the allocator will pop variables from the stack and assign registers or colors to them. In case of variables annotated as possible spill, either the allocator successfully colors them, or otherwise a spill needs to take place by inserting loads and stores in place of defs and uses of that variable.
In our example, we have stack [y, x, t(possible spill)]:
Pop (y):
yinterferes withxandt, but both neighbors have no colors, so we assignytoR1Pop (x):
xinterferes withyandt, but hereyis already assigned to registerR1, so the allocator chooses another available color/register (R2)Pop (t):
tis adjacent to bothxandy, and unfortunately the allocator doesn't have any further registers to assign to it, so a spill is neededโLOAD tandSTORE twill get inserted for each use/def oft

If step Select spilled any variable (like t), then the whole allocation process starts all over again.
To avoid any sort of bias to any approach, a smart Chaitin-Briggs will involve a live range splitting step, which can reduce register pressure and accordingly improve coloring.
Hack SSA Based Allocator
Hack SSA allocator is a global register allocator that tries to solve register allocation for a special kind of graph (as you may have guessed): chordal graphs. Unlike the Chaitin-Briggs allocator, which handles any arbitrary graph, the power of the Hack SSA allocator comes from the fact that it utilizes chordal graph properties (the graph's chromatic number equals the maximum clique size) and SSA form properties (Single Reachable Dominance).
A real-world implementation of such a register allocator can be found under the libFirm project, which was founded by Sebastien Hack, the main author of Optimal register allocation for SSA-form programs in polynomial time.
So how does it work? I will use the same sample from Chaitin-Briggs and see how it looks in SSA form, then how the allocator works against it:
loop_header:
x0 = phi(init_x, x1)
y0 = phi(init_y, y1)
t0 = x0 + y0 // x0 dies here. t0 is born.
x1 = y0 // y0 dies here. x1 is born.
y1 = t0 // t0 dies here. y1 is born.
jump loop_header // Back-edge carries x1, y1

Now, before starting with the main steps, I need to clarify some things about liveness analysis and interference graphs when dealing with SSA form programs:
Liveness: The allocator can deduce liveness of a variable from the dominance tree. The dominance tree is one of the core data structures for one of the most classic and foundational SSA build algorithms (Cytron algorithm), and even if you didn't use it for building an SSA program, it can easily be deduced from an SSA program. The crucial property of the dominator tree and dominance here is that when the allocator wants to check if variables ๐จ and ๐ต interfere, the allocator checks their definitions' dominance relationship (specifically, if Def(๐จ) dominates Def(๐ต) and Use(๐จ) is still live at Def(๐ต), then ๐จ and ๐ต interfere). These two lemmas are called Budimliฤ lemmas.
The answer to those two queries (classically) can be done using:Dominance: by Dominator tree traversal
Reachability: by Iterative Dataflow analysis
But because the program is in SSA form, we can do Sparse Liveness Analysis and do Fast Liveness Analysis, which was released in the paper Fast Liveness Checking for SSA-Form Programs. I will not go through paper details, but to compute dominance and liveness the following preprocessing happens:
DFS (pre and post) Numbering: The compiler will give pre/enter and post/exit numbers for each node in depth-first fashion, so when the first query "Def(A) dominates Def(B)" needs to be answered, we use the formula below:
$$\text{pre}[A] \le \text{pre}[B] \quad \text{AND} \quad \text{post}[B] \le \text{post}[A]$$
Loop Forests: Detect loops in the CFG and mark such loops. This is important to know where the loops start (header) and flow back (latch) to know if variable ๐จ is used in a loop and that it flows back to the header. If that's the case, then we say ๐จ is alive for the whole loop. This helps answer "does Def(๐ต) reach Use(๐จ)" in loops easily.
Reduced Graph Reachability: The reduced graph will be the full CFG excluding back edges (a DAG basically):
$$G_{red} = \text{CFG} - \text{BackEdges}.$$
It answers "does Def(๐ต) reach Use(๐จ)" without considering back edges.
Numbering is used to answer the query "Def(๐จ) dominates Def(๐ต)" and (2 & 3) to answer "Use(๐จ) is still live at Def(๐ต)".
Interference Graph: With SSA, you don't really have to build an interference graph explicitly since it's implicit! Now you may have the following questions:
Q: Why liveness analysis in the first place?
A: As mentioned early in this post, liveness analysis is mainly for checking interference and later building an interference graph. But unlike allocators like Chaitin-Briggs which builds a whole new graph, the liveness info from this pass is to answer future queries when Spilling and Coloring passes run, because at each instruction the allocator will need to ask which variables interfere and which are deadโboth of which can be answered in O(1) due to the preprocessed data structures built before allocation (The Reduced Graph, Loop Forests, etc.).Q: What do you mean by implicit?
A: If you take the dominator tree of any SSA program, some nodeNhas a definition ofVwhich resembles the birth ofV. Any later node that is dominated byNand usesVrepresents an extension of the interval. The subtree starting fromNas root and all nodes that useVrepresent the interval ofV. The same concept applies to any other variable likeX. If you take the intersection ofX's interval andV's interval, which are subtrees in the dominator tree, then the resulting graph is a chordal graph (will explain the proof later).Q: The chordal properties of SSA are kind of clear, but how is the interference graph really computed here?
A: By visiting the dominator tree in pre-order fashion, because when doing so we get PEO (perfect elimination ordering) of the graph, and because it's PEO the graph is chordal, and that graph is the interference graph.On a side note, if you want to compute the PEO of graph G to check that it's chordal (since by definition a graph is chordal if it has a PEO), the algorithm is called maximum cardinality search, which does exactly that.
Now the official steps:
Spilling: Because we know the SSA graph is optimally colorable (because chordal graphs' chromatic number equals maximum clique), the allocator can walk the CFG knowing that at each point/instruction, if the number of simultaneously alive variables is greater than chromatic number K, the allocator needs to spill. So it measures register pressure at each instruction to know if it should spill or not by building a MaxLive set (register pressure set) at each point in the program. In the Hack allocator, the spilling candidate is determined by heuristics (like furthest-next-use or Belady's MIN), which accordingly cause
loadandstoreinstructions to be inserted.In our example: We start with an empty MaxLive set (let's call it MLS) which represents register pressure, where the size of the set represents register pressure at any given point.
at
x0 = phi(init_x,x1), since only one variable was defined starting from this point, MLS={x0}, which also means that register pressure equals 1 here.at
y0 = phi(init_y,y1),y0is born here, so MLS={x0,y0}at
t0 = x0 + y0, variablet0is born and it'sx0's last use, sot0can takex0's place in the set: MLS={t0,y0}same logic applies to the rest of the instructions; here register pressure never exceeds the threshold of K=2 at any point
Coloring: Now that the graph is guaranteed to be K-colorable (thanks to the Spilling pass), this pass assigns registers/colors to each variable. It does so by walking the DOM tree in pre-order fashion, which yields a PEO! Now simply, the allocator, for each instruction, will assign each variable a register or color. The allocated register set starts full, and whenever a variable is defined, it gets assigned a color. The allocator will check if any of the current instruction's operands/variables end here so it frees a register. The set propagates in parent-child fashion obviously, and that's it. Also notice how the heuristic is linear ๐(|๐| + |๐ธ|).
In our example, based on data built by the Spilling pass:
at
x0 = phi(init_x,x1)where MLS={x0}, the allocator will arbitrarily assignx0toR1at
y0 = phi(init_y,y1)where MLS={x0,y0}, the allocator will assigny0toR2since it's the last available registerat
t0 = x0 + y0where MLS={t0,y0}, the allocator will assignt0toR1since it'sx0's last usesame logic applies to the rest of the program

Coalescing: It will remove any unnecessary copies like (
x = y). This is often done by Biased Coloring during the Coloring pass instead of node merging before coloring (with biased coloring,xandywill be assigned the same color if and only if they have affinity/bias edge like phi or copy between them e.g.mov a b). Coalescing is really hard with SSA programs if done the classical way since it may break chordality, and also coalescing is an NP-Complete problem even with SSA.In our example, there's no coalescing needed.
SSA Destruction And Phi Elimination: Traditional allocators destroy SSA before allocation, which destroys the "chordal" property and makes coloring hard. Hack destroys SSA after allocation. Phi nodes either get replaced by parallel copies inserted at the end of phi node predecessor basic blocks, or get replaced by a swap/exchange instruction.
In our example, if you look at the colored graph above from the Coloring pass, you can see that:
x0is alive inR1x1is alive inR2y0is alive inR2y1is alive inR1
For sake of simplicity, let's assume that init_x and init_y are literals. We can see that x0 has an incoming value from x1, so R1 depends on R2, meaning R1=R2. The opposite dependency is happening with y0 and y1, so R2=R1. What the allocator can do is one of two options:
- The backend has a swap/exchange instruction (X86): the allocator will insert an
XCHGinstruction before jumping backward (atjump loop_headera.k.a.JMP loop_headerin x86) when recognizing such a case, for example:
SWAP R1 R2
JMP loop_header
2. Parallel Copy: swapping of registers will happen by spilling one of the registers to memory/stack and then swapping the values:
STORE R1 -> [temp_stack] // Save y1 (currently in R1)
MOV R1, R2 // Move x1 (from R2) into R1.
LOAD R2 <- [temp_stack] // Load y1 into R2.
JUMP loop_header
the stack slot here is called a scratch some architectures like RISC-V has dedicated scratch registers for such case but X86 doesnโt.
Notice how because the program got out of SSA and is no longer chordal, all previous hypotheses are broken and the allocator had to spill to memory.
One can notice a couple of things:
In the Hack SSA sample, the chromatic number K is equal to 2 since the largest clique consists of two vertices at most (no vertex is connected to more than two other vertices).
Hack SSA allocator didn't spill (unless parallel copy with no scratch register was used) while doing spill checks and assigning registers, unlike Chaitin-Briggs. But here I want to clarify one thing so there is no bias! If Chaitin-Briggs ran against the same SSA graph, it would not have spilled, but this is not always the case! Hack SSA allocator is aware of different SSA and chordal graph properties, unlike Chaitin-Briggs which handles any graph as a general graph.
Hack SSA allocator solved the most challenging aspects of register allocation (spilling and reg assignment) in SSA (optimally) and left the handling of phi node destruction and lowering till the end which by then is a non-SSA, non-chordal graphโa small sacrifice relative to doing full register allocation out of SSA.
That being said it doesnโt mean the SSA based register allocators are always better than general ones ! actually let me briefly mention some of the challenges with SSA based register allocators
SSA RAโs sometimes tends to spill more due to late phi elimination.
Pre Coloring: ABI enforces certain register usage like caller/callee saved registers and function argument passing , so pre coloring is required for that which can break the chordal property because it introduces fixed nodes that cannot be trivially colored in PEO order.
Why does it work?
We have talked about how SSA interference graphs are chordal, but why is it like that? Sebastien Hack proved this in his paper, which I am going to talk about as a developer and in developer fashion, not mathematically (which isn't my strong suit).
The SSA Dominator Proof
Hack proved that the SSA interference graph is chordal by contradiction and used the empty square or diamond graph for that.
Let's assume that the SSA interference graph is not chordal and a square can be shaped:

For the sake of proof, let's create the dominator tree of CFG ๐ฎ, where A, B, C are basic blocks:

In Strict SSA form, the live range of a variable is contiguous in the dominator tree, so the live range of a variable ๐ is a subtree in the dominator tree. In order for the SSA interference graph to not be chordal, there must be a way to connect/intersect the live ranges like this: ๐โโ๐โโ๐โโ๐โโ๐โ in the DOM tree in order for the SSA interference graph to be not chordal.
Let's reconstruct the square interference graph now with SSA dominance properties in mind. We start by adding ๐โ and ๐โ live ranges:
๐โ โ {A} : ๐โ is only alive in basic block A
๐โ โ {C} : ๐โ is only alive in basic block C

Now based on the original square interference graph, ๐โ interferes with ๐โ and ๐โ, which means that the live range of ๐โ extends the whole DOM tree from A to C:
- ๐โ โ {A, C}

Again based on the original square interference graph, ๐โ interferes with ๐โ and ๐โ, which means that the live range of ๐โ also extends the whole DOM tree from A to C, so now we have ๐โ and ๐โ having similar live intervals:
๐โ โ {A, C} : we have calculated this above
๐โ โ {A, C}
Based on all of the above, we know the following:
๐โ interferes with ๐โ and ๐โ
๐โ interferes with ๐โ and ๐โ
๐โ has the same live interval as ๐โ
Because ๐โ live interval overlap with ๐โ since they are the same , that means ๐โ interferes with ๐โ, which creates a chord (the blue edge) ๐๐

End of Story
I hope the article gave you a deep look into SSA register allocation and its power. I know the article was dense, but I think it's worth it (hopefully). Thank you for reading this and please if you noticed any technical error in the post let me know that in comments.



