8. Analysis & optimization
This chapter is entirely optional. It has no effect on your grade, but it’s hopefully an interesting glimpse into advanced program analysis and optimization.
Prerequisites: this chapter uses syntax and ideas from functions and pointers, but you don’t need to have them implemented to do the exercises here.
Program analysis means computing or verifying some properties of a program, or part of a program. Program analyses can help find opportunities for program optimization, but they can also be used to check safety and correctness properties, similarly to type systems.
In this chapter, we deviate a little from our earlier theory-light approach and briefly survey the rich landscape of possible program analyses, before seeing how to build one ourselves.
Examples of program analyses:
- Type checking and type narrowing.
- Detecting unreachable code.
- Detecting use of uninitialized variables (which are not possible in our language).
- Live variables analysis: which variables may still be read after a given point in the program?
- Reaching definitions: which code locations might have last written to a given variable.
- Alias analysis: when might two pointers point to the same object?
- Data dependence analysis: which parts of the program might affect a variable’s value?
Examples of optimizations that require program analysis:
- Constant propagation: translating
x = 3; ...; f(x);
into...; f(3);
requires knowing thatx
can only have the value3
inf(x)
. Reaching definitions provides this. - Register allocation: freeing registers early, as soon as live variables analysis says they’re no longer used.
- Copy elision: copies of large objects can sometimes be turned into references through pointers. Requires proving that the copy is never written to.
- Allocation elimination: heap-allocation can sometimes be turned into stack allocation. Requires alias analysis.
- Instruction-level parallelism: instructions that the CPU can likely execute in parallel with other instructions can sometimes be moved closer to each other. Requires data dependence analysis.
Many program safety checks rely on program analysis techniques. Examples include the aforementioned type checks and uninitialized variable checks, as well as Rust’s borrow checker.
Flowgraphs
While some analyses, such as type checking, can be performed naturally on an AST, there are advantages to performing analyses on IR:
- IR represents details such as internal temporary variables, which are important to account for when optimizing (e.g. when allocating registers).
- IR is simpler, so writing a correct analysis is easier.
We will start by making explicit how program execution can proceed through IR code.
A flowgraph is a directed graph whose nodes are IR instructions and whose
edges indicate which IR instruction(s) may be executed next.
A simple instruction like Copy
has an edge to the next instruction.
A Jump(L)
has an edge to the label L
.
A CondJump(..., L1, L2)
has an edge to labels L1
and L2
i.e. both possible jump targets.
(Production compilers typically group instructions into ”basic blocks”, but that’s not essential, so we skip it here.)
You are now ready to do exercise 1.
The dataflow framework
The dataflow framework is a traditional way to organize program analyses. While some newer compilers formulate analyses using logic programming, the underlying ideas remain the same and the traditional method remains powerful and useful, and learning it first is likely easiest anyway.
Example: reaching definitions
We’ll sketch the reaching definitions analysis as an example before formalizing the dataflow framework.
Recall that reaching definitions finds
which instructions may have been the last to write to a given variable.
More formally, reaching definitions finds,
for each IR instruction i
and for each variable x
, the set of
IR instructions where x
might last have been written to before i
.
Reaching definitions is useful for optimizations like constant propagation:
if we know that x
can only be set to a single constant, we can replace
a use of x
with a load of a constant near where it’s used.
This frees up the storage location of x
in the code preceding the use site.
We’ll use this program as an example whose reaching definitions we calculate:
Label(start)
LoadIntConst(1, x) # <-- writes x
Call(some_condition, [x], c)
LoadIntConst(2, x) # <-- writes x
CondJump(c, then_clause, end)
Label(then_clause)
LoadIntConst(3, y) # <-- writes y
Call(f, [x, y], x) # <-- writes x
Jump(end)
Label(end)
Return(x)
The correct answer to reaching definitions at instruction Return(x)
is that:
x
might have been last set byLoadIntConst(2, x)
orCall(f, [x, y], x)
, (but certainly notLoadIntConst(1, x)
).y
might have been last set byLoadIntConst(3, y)
or it might be uninitialized (ifthen_clause
was not executed).
Let’s see how to write an analysis that reaches this conclusion. The idea is that at each instruction, for each variable, we keep track of all possible places where the variable might have last been set. We propagate this information from one instruction to the next, and across jumps. We repeat this until nothing changes any more.
Here is pseudocode for this approach. Afterwards, we’ll generalize much of this into the dataflow framework, of which reaching definitions will be just one instantiation.
- Define the type
State
to be a dictionary of IR variable names to sets of (original) instruction indices. - Define two dictionaries
in
andout
, both mapping instruction indices toState
s. These define the input and output states of each instruction. - Set
out[0]
to a state where each variable used by the function is set to the special value-1
, signifying ”uninitialized”, and each parameter and global is set to-2
, signifying ”predefined”. - For all instruction indices
i > 0
, setin[i]
andout[i]
to empty states. - Define a transfer function
transfer(s, i)
that takes an input states
for instructioni
and produces an output state. The transfer function should return an output state that is the same as the input state, except all variablesx
that the instruction ati
writes to should be mapped to the set{i}
. This means: if the instruction ati
writes to variablex
, then right after the instruction ati
,x
can only have been last set ati
. - For each instruction index
i
:- Define set
P[i]
(”predecessors ofi
”) to be the indices of the instructions that might be executed immediately beforei
:- If
i
refers to a label instruction, thenP[i]
is the set of jump instructions that might jump toi
. - Otherwise,
P[i] = { i - 1 }
.
- If
- Define set
S[i]
(”successors ofi
”) be the indices of the instructions that can be executed immediately afteri
:S[i] = { j | P[j] contains i }
- Define set
- Create a queue
Q
of instruction indices, and enqueue all instruction indices except0
to it. This is our work queue. WhileQ
is not empty:- Dequeue
i
fromQ
. - Set
in[i] = merge(out[j] for each j in P)
wheremerge
returns the union of the given sets. - Set
out[i] = transfer(in[i], i)
. - If
out[i]
changed from its previous value,
then enqueue every element ofS[i]
toQ
.
- Dequeue
You are now ready to do exercise 2.
Generalization
The general dataflow framework can be abstracted from the way we just implemented reaching definitions. We just leave the following things abstract:
- the type
State
- the initial value for
out[0]
- the initial values for all other
in
andout
elements - the function
transfer
, which maps input states to output states - the function
merge
, which merges jump instruction outputs
Here is the pseudocode again, without the specifics of ”reaching definitions”.
- Define a type
State
. - Define two dictionaries
in
andout
, both mapping instruction indices toState
s. These define the input and output states of each instruction. - Set
out[0]
to a suitable start state. - For all instruction indices
i > 0
, setin[i]
andout[i]
to a suitable state. - Define a transfer function
transfer(s, i)
that takes an input states
for instructioni
and produces an output state. - For each instruction index
i
:- Define set
P[i]
(”predecessors ofi
”) to be the indices of the instructions that might be executed immediately beforei
:- If
i
refers to a label instruction, thenP[i]
is the set of jump instructions that might jump toi
. - Otherwise,
P[i] = { i - 1 }
.
- If
- Define set
S[i]
(”successors ofi
”) be the indices of the instructions that can be executed immediately afteri
:S[i] = { j | P[j] contains i }
- Define set
- Create a queue
Q
of instruction indices, and enqueue all instruction indices except0
to it. This is our work queue.WhileQ
is not empty:- Dequeue
i
fromQ
. - Set
in[i] = merge(out[j] for each j in P)
. - Set
out[i] = transfer(in[i], i)
. - If
out[i]
changed from its previous value,
then enqueue every element ofS[i]
toQ
.
- Dequeue
This is the abstract forwards dataflow framework.
You are now ready to do exercise 3.
There is also a very similar backwards dataflow framework
(exercise 4 & exercise 5),
where data flows in reverse: from out
to in
and from successors to predecessors.
The backwards dataflow framework is used by e.g. live variables analysis i.e. ”which variables may still be read after a given point in the program?” The idea in live variables analysis is similar to reaching definitions, except in reverse: it looks backwards through the program, and lets a read of a variable introduce the variable into the state while a write removes it.
Exercises
Reminder: these exercises are optional. Test Gadget will not test your program analyses.
Exercise 1
Write a function that takes a list of IR instructions and returns a flowgraph. Make it possible to retrieve each instruction’s original index too.
Bonus exercise
For later debugging, it may be convenient to write a function that renders a flowgraph using Graphviz.
Exercise 2
Implement the reaching definitions analysis as outlined above, and test it.
Exercise 3
Extract the abstract dataflow framework from your reaching definitions code, and implement reaching definitions using it.
You can make the framework a superclass with abstract methods transfer
and merge
,
or you can make it a function that takes transfer
and merge
functions as parameters.
Exercise 4
Implement the backwards dataflow framework.
In my experience, it is cleaner to tolerate some code duplication here instead of trying to abstract the direction of dataflow.
Exercise 5
Implement live variables analysis using the backwards dataflow framework.
Properties of program analyses
When designing program analyses, it’s good to be aware of some hard trade-offs that almost all program analyses will have to contend with.
Two program analyses that describe the same property of a program can have different:
- precision: how often does the analysis manage to detect the property that it looks for?
- conservatism: when the analysis makes a claim, does it have proof or merely evidence?
Precision
Program analyses can be thought of as proving facts about the program, but they can sometimes fail to find proofs for some true facts.
Consider this example:
{
var x = 3;
if 1 == 2 then {
f(x);
}
}
It is true that x
is never accessed, but for live variables analysis to prove it,
it must know that 1 == 2
is always false, and therefore the then
-clause is
never executed etc. Some live variables analyses can do that, but we can
always invent more complicated scenarios.
There are more detailed ways to characterize precision, and we’ll look at some of them below.
Adding precision can be very costly. The most precise alias analysis methods, for instance, have time complexities proportional to the number of possible program execution paths, which typically grows roughly exponentially with program size.
Conservatism
It is important for a program analysis to be conservative: if it outputs a claim, it must be sure that the claim holds. Otherwise optimizations that rely on its claims could break the program’s behaviour.
For instance, live variables analysis had better be sure that a variable x
really is
never read after a given point for a register allocator to safely reuse x
’s register.
Conversely, if a conservative analysis fails to claim something, we cannot take that as proof that the opposite is true.
For instance, if live variables analysis does not output a claim that variable x
is never read after a given point, then we cannot conclude that x
is surely read after that point. The correct negation is
”x
might be read after that point”.
Non-conservative analyses can still be useful in program style checkers (”linters”)
to warn about likely errors. One such analysis might, for instance,
warn about a potential read from deallocated memory in the following
program, because it can see a potential execution path from delete p
to *p
:
{
var p = new Int(3);
if a {
delete p;
}
if b {
f(*p);
}
}
Such an analysis might be imperfect (non-conservative) because it fails to see
potential deletions of p
in more complicated code, such as this:
{
var p = new Int(3);
var q = f1(p); # maybe `f1` returns a copy of `p`
f2(q); # maybe `f2` deletes `q`
f3(*p); # hard to say if `p` has been deleted
}
We could imagine a conservative version of this analysis that defaults to claiming every pointer dereference is a potential read from deallocated memory, unless the analysis can find an iron-clad proof that it’s not. Such an analysis would be very difficult to write with a low enough false positive rate to be useful, except in a sufficiently constrained language like Rust.
Example: improving live variables analysis with alias analysis
The results of one program analysis can be used to improve the precision of another. For instance, alias analysis can inform live variables analysis like this:
Say we have an integer variable x
and a pointer variable p
.
Ordinarily live variables analysis has to conservatively consider expression
*p
as possibly reading the value of x
, unless it can prove
that p
cannot point to x
. Alias analysis may be able to give that proof:
- If the address of
x
is never taken, then it’s trivial thatp
cannot point tox
- If the address of
x
is sometimes taken, then an alias analysis might still be able to prove that the address ofx
can never be assigned top
.
Intraprocedural vs interprocedural
One measure of precision is how an analysis treats function calls.
An intraprocedural analysis analyzes each function separately, ignoring where and how the function gets called. An intraprocedural analysis essentially sees function calls as black boxes that it can assume very little about.
An interprocedural analysis analyzes many functions, or even all functions, together. It typically creates some compact summary of what each function does (e.g. which aliasing relations a function may create). It then passes these state summaries around in a function-level analogue of the dataflow framework.
Consider this program:
fun f(x: Int) { if x == 0 then ... else ...; }
{
f(0);
}
An interprocedural unreachable code detector could conclude that the
else
-clause is never executed, because x
is always zero.
An intraprocedural unreachable code detector couldn’t make this conclusion,
because it wouldn’t know anything about the value of x
.
Sensitivities
The precision of an analysis can further be classified by whether it’s ”sensitive” to certian facets of the program’s structure, or whether it ignores them. Here are the most common sensitivities:
A flow-sensitive analysis declares its facts relative to some positions in the program.
For instance, a flow-sensitive alias analysis might say that p1
and p2
might point
to the same object at IR instruction 7, whereas a flow-insensitive alias analysis can merely
say that p1
and p2
might point to the same object at some point.
A path-sensitive analysis looks at if
-expression conditions and incorporates
them into its judgements. For instance, a path-sensitive alias analysis might see
if p1 != p2 then ...
and conclude that p1
and p2
cannot point to
the same object in the then
-clause (but might elsewhere), whereas a path-insensitive
alias analysis would not take the if
-expression’s condition into account.
A context-sensitive analysis analyzes each function again for each different calling context. A calling context is typically just a place in the program where the function might be called.
Consider this program:
fun f(x: Int) { if x == 0 then ... else ...; }
{
f(0);
f(1);
}
A context-sensitive unreachable code detector could conclude that the
else
-clause is never executed in the first function call,
and the then
-clause is never executed in the second function call.
In contrast a context-insensitive unreachable code detector would just
not know anything about the parameter x
and think that both clauses
might be reached in both calls.
There is nothing inherently wrong with an analysis lacking some sensitivity. Adding a sensitivity can have significant costs in execution speed and implementation complexity, and the corresponding gains in precision are not necessarily huge. Exploring this space remains an area of research.