Compilers

spring 2024

THIS IS AN OLD VERSION - Find the 2025 course here

5. Type checker

A type checker verifies that the types of arguments are suitable for the functions and operators that they are passed to.

Our interpreter already checks the types of operations when they are executed, making the language dynamically typed. By adding a type checker, we make our language statically typed i.e. we check its types at compile-time rather than at run-time. This is usually a good trade-off.

The type checker may annotate the AST with types for use by later compiler stages.

A basic type checker

Type checking works a lot like an interpreter: it recurses over an AST, but instead of returning a value, it returns a type, such as Int or Bool.

Say we’re looking to type an AST node for the operator +. In our language, the operands must always be integers, and the result is always an integer. Therefore the type checking logic for a + AST node is:

The code might look like this:

# src/compiler/type_checker.py

import compiler.ast as ast
from compiler.types import Int, Type

def typecheck(node: ast.Expr) -> Type:
    match node:
        case ast.BinaryOp():
            t1 = typecheck(node.left)
            t2 = typecheck(node.right)
            if node.op == '+':
                if t1 is not Int or t2 is not Int:
                    raise ...
                return Int
            else:
                ...
        ...

Now let’s consider and if-then-else expression. For it, the type checking logic goes like this:

Code for type-checking if-then-else might look like this:

def typecheck(node: ast.Expr) -> Type:
    match node:
        ...
        case ast.IfThenElse():
            t1 = typecheck(node.condition)
            if t1 is not Bool:
                raise ...
            t2 = typecheck(node.then_branch)
            t3 = typecheck(node.else_branch)
            if t2 != t3:
                raise ...
            return t2
        ...

How do we type-check variables? Just like the interpreter had to look up a variable’s value from a symbol table, a type checker has to look up a variable’s type from a symbol table. The type checker’s symbol table should be hierarchical just like with the interpreter’s.

The unit type

The unit type Unit is a type that permits exactly one value: the unit value. In our interpreter, we used None as the unit value. Unit is used as the result type of expressions like while loops, which don’t have a natural result value.

Other than it being hard-coded as the result of certain constructs, Unit behaves just like any other type in our type checker.

More about Unit and related types.

Exercise (optional)

Sketch type checking code for:

  • while loops
  • if-then with no else

Function types

So far we’ve talked about the ”primitive” types Int, Bool and Unit, and we’ve hard-coded how to type-check built-in operators. Let’s generalize so that we can assign types to functions and operators instead of hard-coding them.

A function type is of the form (P1, P2, ...) => R where P1, P2, … are parameter types and R is the return type.

Examples:

This way we can define the types of most of our built-in functions and operators in the top-level symbol table. This is again very similar to the interpreter, where we defined the built-in functions and operators in the symbol table in Task 3 of chapter 4.

About polymorphism

We currently have a separate print function for every supported type: print_int and print_bool. Suppose we wanted to have a single print function like in Python. What should be the type such a function be?

We’d want print to be able to take many types of value as its parameter and work differently depending on the type. In other words, we’d want the function to be polymorphic. Our type system doesn’t yet have a way to express this.

Operators == and != are similarly problematic: they should work for any pair of values with the same type: we want to be able to compare two integers or two booleans, but not an integer with a boolean.

We have a few options for how to add ==, != and the hypothetical print to our type system, but for now, we won’t implement any of them. We’ll leave == and != as special cases in the type checker code. That is, we hard-code that == and != require the same type on both sides.

Tasks

Task 1

Define Python classes and objects for the types in the program. There should be a single object for each of Int, Bool and Unit, and there should be a class FunType for building function types.

(We use upper case names for our types for two reasons: because it avoids conflicts with Python types like int, and because it’s the prevailing convention in literature and most real-world languages.)

Task 2

Create a class to represent a hierarchical symbol table that maps variable names to types. You can either generalize the interpreter’s existing SymTab class to be generic in the type of values it holds, or you can copy-paste most of it.

Task 3

Starting from the typecheck function that was partially sketched above, complete support for the following:

  • literals
  • variables
  • untyped variable declarations var x = .... (The next task will handle typed variable declarations var x: Type = ...)
  • assignment x = ... (the assigned value must have the same type as the variable)
  • unary and binary operators (with == and != as special cases as discussed above)
  • function calls
  • blocks
  • if and while expressions

To implement typechecking for variables, you’ll need to pass around a symbol table. It should work similarly to the interpreter’s symbol table, with the same scoping logic, name prefix for unary operators, etc.

Task 4

Our language can currently infer the types of variables var x = ... from the initializer code. This is great, but sometimes a programmer might want to specify the type of the variable for readability reasons, and to get a potential type error earlier.

Extend the syntax of var declarations to allow an optional type declaration that looks like this: var name: Type = ....

Example: var x: Int = 1 + 1.

Implement type checking for these type declarations.

Unless you already did so in chapter 3, you’ll need to extend the parser to support type expressions first.

Hint

You may wonder whether to reuse your type Python classes as AST nodes. It is more sustainable to make a separate set of AST nodes ast.TypeExpr and subclasses (analagous to ast.Expr).

This is the way to go if you end up doing type-related optional tasks in the project, because the type expression and the type it evaluates to will no longer always be one-to-one.

Task 5

The following compiler stage will need type information, so let’s add type annotations to the AST.

Add the field type to your ast.Expr base class like this:

from dataclasses import dataclass, field
...

@dataclass
class Expr:
    ...

    type: Type = field(kw_only=True, default=Unit)

(First ensure you don’t already use the field name type in any of your existing AST node classes.)

By saying field(kw_only=True, default=None), we instruct @dataclass to not require this field in the constructor and to initialize the field to Unit by default. This way you won’t have to change existing code to explicitly initialize this field. (If you get an error from this line, see this.)

Modify typecheck so that whenever it visits a node, it also assigns the return value to the visited node’s type field.

Hint: Try to avoid having to remember to set node.type in every case. It may be convenient to put the bulk of typecheck into a separate function.