8. Analysis & optimization

This chapter is optional, but hopefully interesting and useful for some optional project features.

Prerequisites: this chapter uses syntax and ideas from the project features functions and pointers, but you don’t need to have them implemented to do the tasks here.

Program analysis means computing or verifying some properties of a program, or part of a program. Program analyses can help find opportunities for program optimization, but they can also be used to check safety and correctness properties, similarly to type systems.

In this optional chapter, we deviate a little from our earlier theory-light approach and briefly survey the rich landscape of possible program analyses, before seeing how to build one ourselves.

Examples of program analyses:

Type checking and type narrowing.

In polymorphic type systems, a variable can have a general type such as ”any object” or ”any subclass of T”. A type narrowing analysis can deduce that in certain contexts, the type is actually known more precisely.

This can enable optimizations such as skipping run-time method lookup and instead calling a known method directly.
Detecting unreachable code.
Detecting use of uninitialized variables (which are not possible in our language).
Live variables analysis: which variables may still be read after a given point in the program?
Reaching definitions: which code locations might have last written to a given variable.
Alias analysis: when might two pointers point to the same object?
Data dependence analysis: which parts of the program might affect a variable’s value?

Examples of optimizations that require program analysis:

Constant propagation: translating x = 3; ...; f(x); into ...; f(3); requires knowing that x can only have the value 3 in f(x). Reaching definitions provides this.
Register allocation: freeing registers early, as soon as live variables analysis says they’re no longer used.
Copy elision: copies of large objects can sometimes be turned into references through pointers. Requires proving that the copy is never written to.
Allocation elimination: heap-allocation can sometimes be turned into stack allocation. Requires alias analysis.
Instruction-level parallelism: instructions that the CPU can likely execute in parallel with other instructions can sometimes be moved closer to each other. Requires data dependence analysis.

Many program safety checks rely on program analysis techniques. Examples include the aforementioned type checks and uninitialized variable checks, as well as Rust’s borrow checker.

Flowgraphs

While some analyses, such as type checking, can be performed naturally on an AST, there are several advantages to performing analyses on IR by default:

IR represents details such as internal temporary variables, which are important to account for when optimizing (e.g. when allocating registers).
IR is simpler, so writing a correct analysis is easier.

Often a large set of language features map to a few IR features. For instance, various control flow constructs like if-then, if-then-else, while-do, do-while, for, continue, break, return etc. all map to jumps and labels.

An IR-based analysis would support all of these by supporting just jumps and labels, whereas an AST-based analysis would have to handle each construct separately.
Similarly it’s often easier to optimize IR than it is to optimize an AST.

We will start by defining some datastructures that make IR code easier to analyze by making explicit how program execution can proceed through the code.

A basic block is a sequence of IR instructions that is always executed from start to end. That is, it has no jump or return instructions, except possibly as the very last one, and it has no label instructions, except the very first one. We require that each basic block starts with a unique label instruction, since these make for convenient dictionary keys.

You are now ready to do task 1.

A flowgraph is a graph whose nodes are basic blocks and whose edges are possible jumps between those blocks. That is, if basic block labeled A ends in a Jump to label B, then there shall be an edge from A to B. If it’s a CondJump then there shall be an edge to both possible target labels. If the basic block doesn’t end in a jump or a return, then there shall be an edge to the following basic block (and if one doesn’t exist, it’s an error).

You are now ready to do task 2.

The dataflow framework

The dataflow framework is a traditional way to organize program analyses. While some newer compilers formulate analyses using logic programming, the underlying ideas remain the same and the traditional method remains powerful and useful, and learning it first is likely easiest anyway.

Logic programming is a way to define a set of base facts, and recursive rules for deducing new facts. The system can then answer queries about whether a given fact is provable from the given base facts and rules.

Datalog is a logic programming language with restrictions that make it amenable to very efficient execution. Variants of Datalog, such as Souffle, have become popular for program analysis, because it turns out that program analysis questions are quite natural to formulate as logic programs.

This arguably makes it easier to write and compose different analyses, and it may improve the run-time efficiency and concurrency of the analyses.

We don’t take the logic programming approach here, because learning logic programming right now would be impractical and distracting, and because the dataflow framework teaches the same intuitions that one would use with the logic programming approach also.

Example: reaching definitions

We’ll sketch the reaching definitions analysis as an example before formalizing the dataflow framework.

Recall that reaching definitions finds which instructions may have been the last to write to a given variable. More formally, reaching definitions finds, for each IR instruction i and for each variable x, the set of IR instructions where x might last have been written to before i.

Reaching definitions is useful for optimizations like constant propagation: if we know that x can only be set to a single constant, we can replace a use of x with a load of a constant near where it’s used. This frees up the storage location of x in the code preceding the use site.

We’ll use this program as an example whose reaching definitions we calculate:

Label(start)
LoadIntConst(1, x)  # <-- writes x
Call(some_condition, [x], c)
LoadIntConst(2, x)  # <-- writes x
CondJump(c, then_clause, end)

Label(then_clause)
LoadIntConst(3, y)     # <-- writes y
Call(f, [x, y], x)     # <-- writes x
Jump(end)

Label(end)
Return(x)

The correct answer to reaching definitions at instruction Return(x) is that:

x might have been last set by LoadIntConst(2, x) or Call(f, [x, y], x), (but certainly not LoadIntConst(1, x)).
y might have been last set by LoadIntConst(3, y) or it might be uninitialized (if then_clause was not executed).

Let’s see how to write an analysis that reaches this conclusion. The idea is that at each instruction, for each variable, we keep track of all possible places where the variable might have last been set. We propagate this information from one instruction to the next, and across jumps. We repeat this until nothing changes any more.

Here is pseudocode for this approach. Afterwards, we’ll generalize much of this into the dataflow framework, of which reaching definitions will be just one instantiation.

Define the type State to be a dictionary of IR variable names to sets of (original) instruction indices.
Define two dictionaries in and out, both mapping instruction indices to States. These define the input and output states of each instruction.
Set out[0] to a state where each variable used by the function is set to the special value -1, signifying ”uninitialized”, and each parameter and global is set to -2, signifying ”predefined”.
For all instruction indices i > 0, set in[i] and out[i] to empty states.
Define a transfer function transfer(s, i) that takes an input state s for instruction i and produces an output state. The transfer function should return an output state that is the same as the input state, except all variables x that the instruction at i writes to should be mapped to the set {i}. This means: if the instruction at i writes to variable x, then right after the instruction at i, x can only have been last set at i.
For each instruction index i:
- Define set P[i] (”predecessors of i”) to be the indices of the instructions that might be executed immediately before i:
  - If i refers to a label instruction, then P[i] is the set of jump instructions that might jump to i.
  - Otherwise, P[i] = { i - 1 }.
- Define set S[i] (”successors of i”) be the indices of the instructions that can be executed immediately after i: S[i] = { j | P[j] contains i }
Create a queue Q of instruction indices, and enqueue all instruction indices except 0 to it. This is our work queue. While Q is not empty:
- Dequeue i from Q.
- Set in[i] = merge(out[j] for each j in P)
  where merge returns the union of the given sets.
- Set out[i] = transfer(in[i], i).
- If out[i] changed from its previous value,
  then enqueue every element of S[i] to Q.

You are now ready to do task 3.

Generalization

The general dataflow framework can be abstracted from the way we just implemented reaching definitions. We just leave the following things abstract:

the type State
the initial value for out[0]
the initial values for all other in and out elements
the function transfer, which maps input states to output states
the function merge, which merges jump instruction outputs

Here is the pseudocode again, without the specifics of ”reaching definitions”.

Define a type State.
Define two dictionaries in and out, both mapping instruction indices to States. These define the input and output states of each instruction.
Set out[0] to a suitable start state.
For all instruction indices i > 0, set in[i] and out[i] to a suitable state.
Define a transfer function transfer(s, i) that takes an input state s for instruction i and produces an output state.
For each instruction index i:
- Define set P[i] (”predecessors of i”) to be the indices of the instructions that might be executed immediately before i:
  - If i refers to a label instruction, then P[i] is the set of jump instructions that might jump to i.
  - Otherwise, P[i] = { i - 1 }.
- Define set S[i] (”successors of i”) be the indices of the instructions that can be executed immediately after i: S[i] = { j | P[j] contains i }
Create a queue Q of instruction indices, and enqueue all instruction indices except 0 to it. This is our work queue.While Q is not empty:
- Dequeue i from Q.
- Set in[i] = merge(out[j] for each j in P).
- Set out[i] = transfer(in[i], i).
- If out[i] changed from its previous value,
  then enqueue every element of S[i] to Q.

This is the abstract forwards dataflow framework.

You are now ready to do task 4.

There is also a very similar backwards dataflow framework (task 5 & task 6), where data flows in reverse: from out to in and from successors to predecessors.

This is used by e.g. live variables analysis i.e. ”which variables may still be read after a given point in the program?” The idea in live variables analysis similar to reaching definitions, except in reverse: it looks backwards through the program, and lets a read of a variable introduce the variable into the state while a write removes it.

Tasks

Task 1

Write a function that splits a list of IR instructions into a list of basic blocks.

Task 2

Write a function that takes a list of basic blocks and returns a flowgraph. Make it possible to retrieve each instruction’s original (unique) index too.

Bonus task

For later debugging, it may be convenient to write a function that renders a flowgraph using Graphviz.

Task 3

Implement the reaching definitions analysis as outlined above, and test it.

Task 4

Extract the abstract dataflow framework from your reaching definitions code, and implement reaching definitions using it.

You can make the framework a superclass with abstract methods transfer and merge, or a function that takes transfer and merge functions as parameters.

Task 5

Implement the backwards dataflow framework.

In my experience, it is cleaner to tolerate some code duplication here instead of trying to abstract the direction of dataflow.

Task 6

Implement live variables analysis using the backwards dataflow framework.

Properties of program analyses

When using the dataflow framework, it’s good to be aware of the main ways in which analyses are classified and evaluated.

Two program analyses that describe the same property of a program can have different:

precision: how often does the analysis find proofs for its claims?
conservatism: when the analysis makes a claim, does it have proof or merely evidence?

There are more detailed ways to characterize precision, and we’ll look at some of them below.

Space and time complexity is also important, and often in painful tension with precision. The most precise alias analysis methods, for instance, can have time complexities proportional to the number of possible program execution paths, which typically grows roughly exponentially with program size.

Precision

Program analyses can be thought of as proving facts about the program, but they can sometimes fail to find proofs for some true facts.

Consider this example:

{
    var x = 3;
    if 1 == 2 then {
        f(x);
    }
}

It is true that x is never accessed, but for live variables analysis to prove it, it must know that 1 == 2 is always false, and therefore the then-clause is never executed etc. Some live variables analyses can do that, but we can always invent more complicated scenarios.

Conservatism

It is important for a program analysis to be conservative: if it outputs a claim, it must be sure that the claim holds. Otherwise optimizations that rely on its claims could break the program’s behaviour.

For instance, live variables analysis had better be sure that a variable x really is never read after a given point for a register allocator to safely reuse x’s register.

Conversely, if a conservative analysis fails to claim something, we cannot take that as proof that the opposite is true.

For instance, if live variables analysis does not output a claim that variable x is never read after a given point, then we cannot conclude that x is surely read after that point. The correct negation is ”x might be read after that point”.

Non-conservative analyses can still be useful in program style checkers (”linters”) to warn about likely errors. One such an analysis might, for instance, warn about a potential read from deallocated memory in the following program, because it can see a potential execution path from delete p to *p:

{
    var p = new Int(3);
    if 1 == 2 {
        delete p;
    }
    f(*p);
}

Such an analysis might fail to find more complicated code paths, such as when the deletion of p happens in some function, perhaps through some copy of p:

{
    var p = new Int(3);
    var q = f1(p);  # maybe `f1` returns a copy of `p`
    f2(q);          # maybe `f2` deletes `q`
    f3(*p);         # hard to say if `p` has been deleted
}

We could imagine a conservative version of this analysis that defaults to claiming every pointer dereference is a potential read from deallocated memory, unless it can prove otherwise. Such an analysis would be very difficult to write with a low enough false positive rate to be useful, except in a sufficiently constrained language, such as Rust.

Example: improving live variables analysis with alias analysis

The results of one program analysis can be used to improve the precision of another. For instance, alias analysis can inform live variables analysis like this:

Say we have an integer variable x and a pointer variable p.

Ordinarily live variables analysis has to conservatively consider expression *p as possibly reading the value of x, unless it can prove that p cannot point to x. Alias analysis may be able to give that proof:

If the address of x is never taken, then it’s trivial that p cannot point to x
If the address of x is sometimes taken, then an alias analysis might still be able to prove that the address of x can never be assigned to p.

Intraprocedural vs interprocedural

One measure of precision is how an analysis treats function calls.

An intraprocedural analysis analyzes each function separately, ignoring where and how the function gets called. An intraprocedural analysis essentially sees function calls as black boxes that it can assume very little about.

An interprocedural analysis analyzes many functions, or even all functions, together. It typically creates some compact summary of what each function does (e.g. which aliasing relations a function may create). It then passes these state summaries around in a function-level analogue of the dataflow framework.

Consider this program:

fun f(x: Int) { if x == 0 then ... else ...; }

{
    f(0);
}

An interprocedural unreachable code detector could conclude that the else-clause is never executed, because x is always zero. An intraprocedural unreachable code detector couldn’t make this conclusion, because it wouldn’t know anything about the value of x.

Sensitivities

The precision of an analysis can further be classified by whether it’s ”sensitive” to certian facets of the program’s structure, or whether it ignores them. Here are the most common sensitivities:

A flow-sensitive analysis declares its facts relative to some positions in the program. For instance, a flow-sensitive alias analysis might say that p1 and p2 might point to the same object at IR instruction 7, whereas a flow-insensitive alias analysis can merely say that p1 and p2 might point to the same object at some point.

A path-sensitive analysis looks at if-expression conditions and incorporates them into its judgements. For instance, a path-sensitive alias analysis might see if p1 != p2 then ... and conclude that p1 and p2 cannot point to the same object in the then-clause (but might elsewhere), whereas a path-insensitive alias analysis would not take the if-expression’s condition into account.

A context-sensitive analysis analyzes each function again for each different calling context. A calling context is typically just a place in the program where the function might be called.

Consider this program:

fun f(x: Int) { if x == 0 then ... else ...; }

{
    f(0);
    f(1);
}

A context-sensitive unreachable code detector could conclude that the else-clause is never executed in the first function call, and the then-clause is never executed in the second function call.

In contrast a context-insensitive unreachable code detector would just not know anything about the parameter x and think that both clauses might be reached in both calls.

Calling contexts can be considerably more complicated.

Consider this program:

fun f(x: Int) { g(x + 100); }
fun g(y: Int) { ... }

{
    f(1);
    f(2);
    g(3);
}

There are two call sites for g: g(3) an g(x + 100), but there are actually three possible call paths to g: f(1) -> g(x + 100), f(2) -> g(x + 100) and g(3).

A less precise context-sensitive analysis might reanalyze g for all call sites, whereas a more precise context-sensitive analysis might reanalyze g for all call paths.

In this example, an analysis using call paths could figure out all possible values for g’s parameter y (3, 101 and 102) whereas just looking at the call site g(x + 100) we would not know the value of x (absent some other mechanism that summarizes for the analysis the possible values of x).

Context-sensitivity using call sites is a bit more expensive than context-insensitivity, but context-sensitivity using call paths can be exponentially more expensive. For this reason, analyses opting to use call paths often just look at the last k steps in the call path.

More exotic choices for calling contexts are also possible:

possible (combinations of) parameter values
possible (combinations of) parameter object creation sites (called ”object-sensitivity”)
possible (combinations of) parameter types (for polymorphic functions)

There is nothing inherently wrong with an analysis lacking some sensitivity. Adding a sensitivity can have significant costs in execution speed and implementation complexity, and the corresponding gains in precision are not necessarily huge. Exploring this space remains an area of research.

Compilers

spring 2024

8. Analysis & optimization

Flowgraphs

The dataflow framework

Example: reaching definitions

Generalization

Tasks

Properties of program analyses

Precision

Conservatism

Intraprocedural vs interprocedural

Sensitivities