5. Type checker
A type checker verifies that the types of arguments are suitable for the functions and operators that they are passed to.
Our interpreter already checks the types of operations when they are executed, making the language dynamically typed. By adding a type checker, we make our language statically typed i.e. we check its types at compile-time rather than at run-time. This is usually a good trade-off.
The type checker may annotate the AST with types for use by later compiler stages.
A basic type checker
Type checking works a lot like an interpreter: it recurses over an AST,
but instead of returning a value, it returns a type, such as Int or Bool.
Say we’re looking to type an AST node for the operator +.
In our language, the operands must always be integers, and the result is
always an integer. Therefore the type checking logic for a + AST node is:
- Recursively get the types for the left and right nodes.
- Check that the left and right operators are of type
Int. - Return
Intas the type of the+node.
The code might look like this:
# src/compiler/type_checker.py
import compiler.ast as ast
from compiler.types import Int, Type
def typecheck(node: ast.Expr) -> Type:
match node:
case ast.BinaryOp():
t1 = typecheck(node.left)
t2 = typecheck(node.right)
if node.op == '+':
if t1 is not Int or t2 is not Int:
raise ...
return Int
else:
...
...
Now let’s consider and if-then-else expression.
For it, the type checking logic goes like this:
- Check that the type of the condition is a
Bool. - Get the types of both branches and check that they are the same.
- Return the type of both branches.
Code for type-checking if-then-else might look like this:
def typecheck(node: ast.Expr) -> Type:
match node:
...
case ast.IfThenElse():
t1 = typecheck(node.condition)
if t1 is not Bool:
raise ...
t2 = typecheck(node.then_branch)
t3 = typecheck(node.else_branch)
if t2 != t3:
raise ...
return t2
...
How do we type-check variables? Just like the interpreter had to look up a variable’s value from a symbol table, a type checker has to look up a variable’s type from a symbol table. The type checker’s symbol table should be hierarchical just like with the interpreter’s.
The unit type
The unit type Unit is a type that permits exactly one value:
the unit value. In our interpreter, we used None as the unit value.
Unit is used as the result type of expressions like while loops,
which don’t have a natural result value.
Other than it being hard-coded as the result of certain constructs,
Unit behaves just like any other type in our type checker.
More about Unit and related types.
Exercise (optional)
Sketch type checking code for:
- while loops
if-thenwith noelse
Function types
So far we’ve talked about the ”primitive” types Int, Bool and Unit,
and we’ve hard-coded how to type-check built-in operators.
Let’s generalize so that we can assign types to functions and operators
instead of hard-coding them.
A function type is of the form (P1, P2, ...) => R
where P1, P2, … are parameter types and R is the return type.
Examples:
- operator
+has type(Int, Int) => Int - operator
<has type(Int, Int) => Bool - operator
orhas type(Bool, Bool) => Bool - function
print_inthas type(Int) => Unit
This way we can define the types of most of our built-in functions and operators in the top-level symbol table. This is again very similar to the interpreter, where we defined the built-in functions and operators in the symbol table in Task 3 of chapter 4.
With the types for functions and operators in the symbol table, you don’t need to write special case typechecking code for each function and operator. Instead you can look up the type of the function or operator, check that it’s a function type, and check that the given arguments match have the expected types.
About polymorphism
We currently have a separate print function for every supported
type: print_int and print_bool.
Suppose we wanted to have a single print function like in Python.
What should be the type such a function be?
We’d want print to be able to take many types of value as its parameter
and work differently depending on the type.
In other words, we’d want the function to be polymorphic.
Our type system doesn’t yet have a way to express this.
Operators == and != are similarly problematic:
they should work for any pair of values with the same type:
we want to be able to compare two integers or two booleans,
but not an integer with a boolean.
We have a few options for
how to add ==, != and the hypothetical print to our type system,
but for now, we won’t implement any of them.
We’ll leave == and != as special cases in the type checker code.
That is, we hard-code that == and != require the same type on both sides.
Tasks
Task 1
Define Python classes and objects for the types in the program.
There should be a single object for each of Int, Bool and Unit,
and there should be a class FunType for building function types.
(We use upper case names for our types for two reasons:
because it avoids conflicts with Python types like int,
and because it’s the prevailing convention in literature and most real-world languages.)
Task 2
Create a class to represent a hierarchical symbol table
that maps variable names to types.
You can either generalize the interpreter’s existing SymTab class to be
generic in the type of values it holds, or you can copy-paste most of it.
Task 3
Starting from the typecheck function that was partially sketched above,
complete support for the following:
- literals
- variables
- untyped variable declarations
var x = .... (The next task will handle typed variable declarationsvar x: Type = ...) - assignment
x = ...(the assigned value must have the same type as the variable) - unary and binary operators (with
==and!=as special cases as discussed above) - function calls
- blocks
ifandwhileexpressions
To implement typechecking for variables, you’ll need to pass around a symbol table. It should work similarly to the interpreter’s symbol table, with the same scoping logic, name prefix for unary operators, etc.
You can either write special-case code for typechecking each function and operator, or you can do it generically with function types in the symbol table as described above.
Writing unit tests as you go is a good idea.
Task 4
Our language can currently infer the types of variables var x = ... from the initializer code.
This is great, but sometimes a programmer might want to specify the type of the variable
for readability reasons, and to get a potential type error earlier.
Extend the syntax of var declarations to allow an optional type declaration
that looks like this: var name: Type = ....
Example: var x: Int = 1 + 1.
Implement type checking for these type declarations.
Unless you already did so in chapter 3, you’ll need to extend the parser to support types first.
- You don’t necessarily need to define separate AST nodes for types. The parser can place the type objects defined in Task 1 in the AST.
- Variables whose type is a function type are possible.
The following should work:
{ var f: (Int) => Unit = print_int; f(123) }
Task 5
The following compiler stage will need type information, so let’s add type annotations to the AST.
Add the field type to your ast.Expr base class like this:
from dataclasses import dataclass, field
...
@dataclass
class Expr:
...
type: Type = field(kw_only=True, default=Unit)
(Ensure you don’t already use the field name type in
any of your existing AST node classes.)
By saying field(kw_only=True, default=None), we instruct @dataclass
to not require this field in the constructor and to initialize the field to Unit by default.
This way you won’t have to change existing code to explicitly initialize this field.
(If you get an error from this line, see this.)
Modify typecheck so that whenever it visits a node,
it also assigns the return value to the visited node’s type field.
Hint: Try to avoid having to remember to set node.type in every
case. It may be convenient to put the bulk of typecheck into
a separate function.