Project

The course project has you build all the parts of a compiler for a simple language.

The tasks in chapters 2-7 guide you in building up most of the compiler. Completing these tasks correctly gives grade 4.

Grade 5 requires a bit of independent work: you need to add break and continue expressions and functions to your language as described below.

Rules

The project’s deadline is March 17, 23:59:59.

You must submit your project to the Test Gadget system. The setup instructions explain how to do this.

You may use your language’s general-purpose standard library features, including the regular expression library. No other libraries or tools meant for building compilers are allowed, e.g. no parser combinators. General-purpose libraries like collection extensions and unit testing libraries are OK. When in doubt, ask. Your compiler may optionally link its outputs with the C standard library.

Your compiler may have reasonable limitations, but you may not directly special-case your code so that it passes specific tests, nor ”game the system” in other ways.

‼️ It is not allowed to use ChatGPT, Copilot and similar AI systems for code generation. You may ask them theory questions if you want, but don’t give them your code to debug or explain, and don’t take any code from them.

The reason is that AI tools would rob you of valuable practice in writing, debugging and understanding code. They also frequently make mistakes, lead you down undesirable paths, and have trouble understanding the big picture.

Grade requirements

Passing the course requires completing all stages of the compiler, i.e. something from each of the chapters 2-7 (chapter 4’s interpreter is optional).

Test Gadget’s tests are split into the following sets:

Test set	Features tested
Test set A	Basics: math expressions; variables; integers; booleans
Test set B	Conditionals and loops (`if` and `while`); scopes
Test set C	Type checking; larger and more detailed tests for other features
Test set D	Function definitions; function calls; `break` and `continue`
Extra	Optional tests for advanced features and details. No effect on grade.

Your grade is determined by how much of each test set passes.

Grade	Required tests
1	80% of test set A
2	80% of test sets A and B
3	90% of test sets A, B and C
4	100% of test sets A, B and C
5	100% of test sets A, B, C and D

Note that Test Gadget only has end-to-end tests that test all compiler stages at once. You have to be quite far along before any of these tests pass. It’s highly recommended to write your own unit tests for the early compiler stages (tokenizer, parser, type checker). This is very likely to save you time overall.

Base language

You must implement a tokenizer, parser, type checker, IR generator and Assembly generator for the following language constructs, as instructed by the tasks in chapters 2-7.

Integer literals
Boolean literals
Binary operators +, -, *, /, %, ==, !=, <, <=, >, >=, and, or with precedence and left-associativity.
Unary operators - and not
Library functions print_int, read_int and print_bool
Blocks of statements
Local variables with initializers
Assignment statements (right-associative)
if-then and if-then-else
while loops

Break and continue

This feature is only required for grade 5.

Add expressions break and continue to the language. Just like in Python, break must exit the innermost active loop, and continue must go back to its beginning.

Hints:

In the interpreter, it’s probably easiest to implement these by abusing Python exceptions.
The types of these expressions can be Unit.
In IR generation, you need to keep track of the current innermost loop’s start and end labels.
Extra task (no effect on grade): allow break to take an optional result value, and make that the result value of the loop (instead of Unit). The type checker must check that every break of the same loop has a result value of the same type.

Functions

This feature is only required for grade 5.

Initially our compiler only compiles a single top-level expression. Make it possible to define new functions as well.

Example:

fun square(x: Int): Int {
    return x * x;
}

fun vec_len_squared(x: Int, y: Int): Int {
    return square(x) + square(y);
}

fun print_int_twice(x: Int): Unit {
    print_int(x);
    print_int(x);
}

print_int_twice(vec_len_squared(3, 4));

Guide:

Currently our language’s top-level syntactic element is one or more top-level expressions. Let’s change it to be a module instead. A module can contain function definitions in addition to the top-level expressions. (In this design, you should not consider modules and function definitions themselves to be expressions).

Make the following structural changes, but don’t write code to actually do anything with function definitions just yet:
- Define AST node types for function definitions and modules.
- Change the parser to emit a module AST node instead of an expression AST node. (For now it can emit a module containing just a top-level expression and no functions.)
- Change the interpreter, type-checker and IR generator to take a module AST node as input (For now they can ignore any functions and just work on the top-level expression.)
- Change the IR generator to return a dictionary of function names to instruction lists. You can hard-code ”main” as the name of the top-level expression.
- Change the Assembly generator to take this dictionary and emit Assembly code for each function. You’ll need to generate unique labels within each function, because the scope of the labels is the whole Assembly file.
In the parser, implement the parsing functions for modules and for function definitions.
In the stages that use a symbol table, put all user-defined functions in the symbol table and treat them similarly to how you’ve treated built-in functions so far.
- In the interpreter, map each user-defined function to a Python function that runs the interpreter on the user-defined function’s body.
- In the type-checker, map each user-defined function to a function type based on the type annotations.
- In the IR and Assembly generators, do the same as with built-in functions.
Make incoming parameters available as variables in all stages that use a symbol table.
Change the type-checker, IR generator and Assembly generator to process function definitions in addition to the top-level expression.
Implement return ... expressions in all compiler stages.
- A function whose return type is not Unit must execute a return expression. It is unspecified what happens if a non-Unit function fails to execute a return.
- Test Gadget does not have test cases with such missing return expressions, but it does have cases where the only return occurs within a while true loop.

Also note:

Function calls and function return values must be correctly type-checked.
Recursion and mutual recursion (f calls g, g calls f) must work.
It is not required to make variables declared in top-level expressions available to functions.