Compilers

spring 2024

THIS IS AN OLD VERSION - Find the 2025 course here

4. Interpreter

An interpreter is a program that executes the source code directly, without turning it into machine code first.

An interpreter is not a mandatory part of the course project, and indeed compilers don’t usually come with interpreters. That said, unless you’ve found everything so far easy and feel very comfortable with recursion, it’s not recommended to skip this chapter, because the same techniques will be applied to more difficult problems in the next two chapters.

For most languages, it’s easiest to write an interpreter that operates recursively on an AST. Production-quality interpreters like Python’s tend to first translate the AST to a flat sequence of commands, sometimes called ”bytecode”, and interpret that, because that’s more efficient.

We’ll write an AST-based interpreter, because it’s more valuable to practice recursing over the AST at this stage, and because it’s more fun to get your project to run programs sooner rather than later.

Recursion over an AST

Let’s look at the AST for the following expression

if x < 10 then print_int(x + 7) else print_int(0)

The AST has the following types of nodes:

We need to define how to compute the result for each of these.

The code for a very simple interpreter, without support for variables and with very poor error checking, might look something like this:

# src/compiler/interpreter.py
from typing import Any
from compiler import ast

type Value = int | bool | None

def interpret(node: ast.Expr) -> Value:
    match node:
        case ast.Literal():
            return node.value

        case ast.BinaryOp():
            a: Any = interpret(node.left)
            b: Any = interpret(node.right)
            if node.op == '+':
                return a + b
            elif node.op == '<':
                return a < b
            else:
                raise ...

        case ast.IfThenElse():
            if interpret(node.condition):
                return interpret(node.then_branch)
            else:
                return interpret(node.else_branch)
        ...

You are now ready to do task 1.

Symbol tables

An interpreter needs some place to look up the values for variables. Such a place is usually called a symbol table. A symbol table is almost always needed when recursing over the AST – the next two chapters (type checker and IR generator) will use one too.

Our first thought might be that a symbol table is just a dictionary that maps variable names to values, but that is not enough.

Consider nested blocks that may define new variables. These variables must only be visible inside that block, i.e. the block defines a scope. Variables defined in a scope may hide i.e. shadow variables of the same name in outer scopes.

Example:

{
    var x = 1;
    {  # <----------- Inner scope begins
        var x = 2;  # Shadows the 'x' in the outer scope
        var y = 3;
        print_int(x);   # Prints 2
        print_int(y);   # Prints 3
    }  # <----------- Inner scope ends
    print_int(x);  # Prints 1
    print_int(y);  # Error: 'y' does not exist in this outer scope
}

We could just ban shadowing (e.g. C# does so, mostly), but shadowing is sometimes useful (e.g. Rust even encourages it).

We’ll definitely want to support shadowing later when we add functions to the language. We at least want to be able to add a global variable even if some function happens to use the same variable name internally.

The solution is a hierarchical symbol table. It contains two things:

In the example above, at the first print_int(x), the hierarchical symbol table would look like this:

The top-level symbol table is where global things like our built-in functions and operators are defined. We’ll put them there in task 3.

Tasks

Task 1

Get the interpreter code example working in your project and test that interpreting 2 + 3 works.

You don’t need to improve error checking in cases where the type checker, which we will write in chapter 5, would reject the program. For example, 1 + (2 < 3) tries to add a number and a boolean, which the type checker won’t allow. It’s OK for the interpreter to fail with an ugly error, or do something unexpected in these cases (in Python, 1 + (2 < 3) happens to be 2 because True is converted to 1).

Task 2

Write a class for a hierarchical symbol table (call it e.g. SymTab) and make interpret take a SymTab as a parameter.

Implement variable declarations and blocks in your interpreter.

Task 3

Refactor your interpreter so that built-in functions and operators are defined as symbols in the top-level symbol table.

This makes the code easier to read as library code is no longer mixed with the interpreter code. This is also the structure we want when we add user-defined functions to the language later.

To distinguish between unary and binary operators, you can prefix the unary operators with unary_ in the symbol table.

Finally, implement all operators listed in the language spec.

Hint: you’ll need to extend the type definiton of Value to include callable functions.

Hint: you can keep the assignment operator = as a special case.

Task 4

Consider the logical operators and and or.

In almost all languages, these are short-circuiting: if the left operand is enough to know the result (false for and or true for or) then the right operand is not evaluated at all. E.g. a and f(b) never calls f if a is false.

Implement short-circuiting for these operators.

Hint: you can either make these operators special cases in the interpreter (recommended), or you can extend the definition of Value to allow macros: special kinds of operators/functions that can choose when and if to evaluate each of their arguments.

Hint: to unit test this, you can use test programs like this:

var evaluated_right_hand_side = false;
true or { evaluated_right_hand_side = true; true };
evaluated_right_hand_side  # Should be false

Task 5

Complete the interpreter by implementing all constructs mentioned in the language spec.

As mentioned in task 1, the interpreter is allowed to fail or do silly things if the program would not pass type checking.

Hint: you can represent the value unit with Python’s None.