A type checker verifies that the types of arguments
are suitable for the functions and operators that they are passed to.
Our interpreter already checks the types of operations when they are executed,
making the language dynamically typed. By adding a type checker,
we make our language statically typed
i.e. we check its types at compile-time rather than at run-time.
This is usually a good trade-off.
The type checker may annotate the AST with types for use by later compiler stages.
A basic type checker
Type checking works a lot like an interpreter: it recurses over an AST,
but instead of returning a value, it returns a type, such as Int or Bool.
Say we’re looking to type an AST node for the operator +.
In our language, the operands must always be integers, and the result is
always an integer. Therefore the type checking logic for a + AST node is:
Recursively get the types for the left and right nodes.
Check that the left and right operators are of type Int.
How do we type-check variables? Just like the interpreter had to look up a variable’s
value from a symbol table, a type checker has to look up a variable’s type from a symbol table.
The type checker’s symbol table should be hierarchical just like with the interpreter’s.
The unit type
The unit typeUnit is a type that permits exactly one value:
the unit value. In our interpreter, we used None as the unit value.
Unit is used as the result type of expressions like while loops,
which don’t have a natural result value.
Other than it being hard-coded as the result of certain constructs,
Unit behaves just like any other type in our type checker.
More about Unit and related types.
Exercise (optional)
Sketch type checking code for:
while loops
if-then with no else
Function types
So far we’ve talked about the ”primitive” types Int, Bool and Unit,
and we’ve hard-coded how to type-check built-in operators.
Let’s generalize so that we can assign types to functions and operators
instead of hard-coding them.
A function type is of the form (P1, P2, ...) => R
where P1, P2, … are parameter types and R is the return type.
Examples:
operator + has type (Int, Int) => Int
operator < has type (Int, Int) => Bool
operator or has type (Bool, Bool) => Bool
function print_int has type (Int) => Unit
This way we can define the types of most of our built-in functions and operators in
the top-level symbol table. This is again very similar to the interpreter,
where we defined the built-in functions and operators in the symbol table
in Task 3 of chapter 4.
About polymorphism
We currently have a separate print function for every supported
type: print_int and print_bool.
Suppose we wanted to have a single print function like in Python.
What should be the type such a function be?
We’d want print to be able to take many types of value as its parameter
and work differently depending on the type.
In other words, we’d want the function to be polymorphic.
Our type system doesn’t yet have a way to express this.
Operators == and != are similarly problematic:
they should work for any pair of values with the same type:
we want to be able to compare two integers or two booleans,
but not an integer with a boolean.
We have a few options for
how to add ==, != and the hypothetical print to our type system,
but for now, we won’t implement any of them.
We’ll leave == and != as special cases in the type checker code.
That is, we hard-code that == and != require the same type on both sides.
Tasks
Task 1
Define Python classes and objects for the types in the program.
There should be a single object for each of Int, Bool and Unit,
and there should be a class FunType for building function types.
(We use upper case names for our types for two reasons:
because it avoids conflicts with Python types like int,
and because it’s the prevailing convention in literature and most real-world languages.)
Task 2
Create a class to represent a hierarchical symbol table
that maps variable names to types.
You can either generalize the interpreter’s existing SymTab class to be
generic in the type of values it holds, or you can copy-paste most of it.
Task 3
Starting from the typecheck function that was partially sketched above,
complete support for the following:
literals
variables
untyped variable declarations var x = ....
(The next task will handle typed variable declarations var x: Type = ...)
assignment x = ... (the assigned value must have the same type as the variable)
unary and binary operators (with == and != as special cases as discussed above)
function calls
blocks
if and while expressions
To implement typechecking for variables, you’ll need to pass around a symbol table.
It should work similarly to the interpreter’s symbol table, with the same scoping logic,
name prefix for unary operators, etc.
Writing unit tests as you go is a good idea.
Task 4
Our language can currently infer the types of variables var x = ... from the initializer code.
This is great, but sometimes a programmer might want to specify the type of the variable
for readability reasons, and to get a potential type error earlier.
Extend the syntax of var declarations to allow an optional type declaration
that looks like this: var name: Type = ....
Example: var x: Int = 1 + 1.
Implement type checking for these type declarations.
Unless you already did so in chapter 3, you’ll need to extend the parser to support
type expressions first.
It’s recommended to make a separate hierarchy of AST nodes:
ast.TypeExpr and subclasses, analagous to ast.Expr
Task 5
The following compiler stage will need
type information, so let’s add type annotations to the AST.
Add the field type to your ast.Expr base class like this:
(Ensure you don’t already use the field name type in
any of your existing AST node classes.)
By saying field(kw_only=True, default=None), we instruct @dataclass
to not require this field in the constructor and to initialize the field to Unit by default.
This way you won’t have to change existing code to explicitly initialize this field.
(If you get an error from this line, see this.)
Modify typecheck so that whenever it visits a node,
it also assigns the return value to the visited node’s type field.
Hint: Try to avoid having to remember to set node.type in every
case. It may be convenient to put the bulk of typecheck into
a separate function.