4. Interpreter
An interpreter is a program that executes the source code directly, without turning it into machine code first.
An interpreter is not a mandatory part of the course project, and indeed compilers don’t usually come with interpreters. That said, unless you’ve found everything so far easy and feel very comfortable with recursion, it’s not recommended to skip this chapter, because the same techniques will be applied to more difficult problems in the next two chapters.
For most languages, it’s easiest to write an interpreter that operates recursively on an AST. Production-quality interpreters like Python’s tend to first translate the AST to a flat sequence of commands, sometimes called ”bytecode”, and interpret that, because that’s more efficient.
We’ll write an AST-based interpreter, because it’s more valuable to practice recursing over the AST at this stage, and because it’s more fun to get your project to run programs sooner rather than later.
Recursion over an AST
Let’s look at the AST for the following expression
if x < 10 then print_int(x + 7) else print_int(0)
The AST has the following types of nodes:
- Constant literals:
10
,7
and0
. - Variables:
x
andprint_int
- Operators:
<
and+
- Function calls:
name(arguments)
- An if-then-else expression
We need to define how to compute the result for each of these.
- Constants are super easy: just return the constant value.
- Variables need to look up the value from somewhere - we’ll get back this.
- Operators and function calls need to recursively evaluate their arguments and then apply the operator or function on the results.
- If-then-else expressions need to recursively evaluate the condition and then decide which branch to recursively evaluate.
The code for a very simple interpreter, without support for variables and with very poor error checking, might look something like this:
# src/compiler/interpreter.py
from typing import Any
from compiler import ast
type Value = int | bool | None
def interpret(node: ast.Expr) -> Value:
match node:
case ast.Literal():
return node.value
case ast.BinaryOp():
a: Any = interpret(node.left)
b: Any = interpret(node.right)
if node.op == '+':
return a + b
elif node.op == '<':
return a < b
else:
raise ...
case ast.IfThenElse():
if interpret(node.condition):
return interpret(node.then_branch)
else:
return interpret(node.else_branch)
...
You are now ready to do task 1.
Symbol tables
An interpreter needs some place to look up the values for variables. Such a place is usually called a symbol table. A symbol table is almost always needed when recursing over the AST – the next two chapters (type checker and IR generator) will use one too.
Our first thought might be that a symbol table is just a dictionary that maps variable names to values, but that is not enough.
Consider nested blocks that may define new variables. These variables must only be visible inside that block, i.e. the block defines a scope. Variables defined in a scope may hide i.e. shadow variables of the same name in outer scopes.
Example:
{
var x = 1;
{ # <----------- Inner scope begins
var x = 2; # Shadows the 'x' in the outer scope
var y = 3;
print_int(x); # Prints 2
print_int(y); # Prints 3
} # <----------- Inner scope ends
print_int(x); # Prints 1
print_int(y); # Error: 'y' does not exist in this outer scope
}
We could just ban shadowing (e.g. C# does so, mostly), but shadowing is sometimes useful (e.g. Rust even encourages it).
We’ll definitely want to support shadowing later when we add functions to the language. We at least want to be able to add a global variable even if some function happens to use the same variable name internally.
The solution is a hierarchical symbol table. It contains two things:
locals
: a dictionary of variable names to values – contains variables defined in the current scope,parent
: an optional reference to the outer scope’s symbol table – we consult this recursively when a variable is not found inlocals
.
In the example above, at the first print_int(x)
, the hierarchical symbol table would
look like this:
The top-level symbol table is where global things like our built-in functions and operators are defined. We’ll put them there in task 3.
Tasks
Task 1
Get the interpreter code example working in your project
and test that interpreting 2 + 3
works.
You don’t need to improve error checking in cases where the type checker,
which we will write in chapter 5, would reject the program.
For example, 1 + (2 < 3)
tries to add a number and a boolean,
which the type checker won’t allow.
It’s OK for the interpreter to fail with an ugly error,
or do something unexpected in these cases
(in Python, 1 + (2 < 3)
happens to be 2
because True
is converted to 1
).
Task 2
Write a class for a hierarchical symbol table (call it e.g. SymTab
)
and make interpret
take a SymTab
as a parameter.
Implement variable declarations and blocks in your interpreter.
Task 3
Refactor your interpreter so that built-in functions and operators are defined as symbols in the top-level symbol table.
This makes the code easier to read as library code is no longer mixed with the interpreter code. This is also the structure we want when we add user-defined functions to the language later.
To distinguish between unary and binary operators,
you can prefix the unary operators with unary_
in the symbol table.
Finally, implement all operators listed in the language spec.
Hint: you’ll need to extend the type definiton of Value
to include callable functions.
Hint: you can keep the assignment operator =
as a special case.
Task 4
Consider the logical operators and
and or
.
In almost all languages,
these are short-circuiting: if the left operand
is enough to know the result (false for and
or true for or
) then
the right operand is not evaluated at all. E.g. a and f(b)
never calls f
if a
is false.
Implement short-circuiting for these operators.
Hint: you can either make these operators special cases in the interpreter (recommended),
or you can extend the definition of Value
to allow macros: special kinds of
operators/functions that can choose when and if to evaluate each of their arguments.
Hint: to unit test this, you can use test programs like this:
var evaluated_right_hand_side = false;
true or { evaluated_right_hand_side = true; true };
evaluated_right_hand_side # Should be false
Task 5
Complete the interpreter by implementing all constructs mentioned in the language spec.
As mentioned in task 1, the interpreter is allowed to fail or do silly things if the program would not pass type checking.
Hint: you can represent the value unit
with Python’s None
.