Compilers

spring 2025

Project

The course project has you build all the parts of a compiler for a simple language.

The tasks in chapters 2-7 guide you in building up most of the compiler. Completing these tasks correctly gives grade 4.

Grade 5 requires a bit of independent work: you need to add break and continue expressions and functions to your language as described below.

Rules

The project’s deadline is March 17, 23:59:59.

You must submit your project to the Test Gadget system. The setup instructions explain how to do this.

You may use your language’s general-purpose standard library features, including the regular expression library. No other libraries or tools meant for building compilers are allowed, e.g. no parser combinators. General-purpose libraries like collection extensions and unit testing libraries are OK. When in doubt, ask. Your compiler may optionally link its outputs with the C standard library.

Your compiler may have reasonable limitations, but you may not directly special-case your code so that it passes specific tests, nor ”game the system” in other ways.

‼️ It is not allowed to use ChatGPT, Copilot and similar AI systems for code generation. You may ask them theory questions if you want, but don’t give them your code to debug or explain, and don’t take any code from them.

The reason is that AI tools would rob you of valuable practice in writing, debugging and understanding code. They also frequently make mistakes, lead you down undesirable paths, and have trouble understanding the big picture.

Grade requirements

Passing the course requires completing all stages of the compiler, i.e. something from each of the chapters 2-7 (chapter 4’s interpreter is optional).

Test Gadget’s tests are split into the following sets:

Test set Features tested
Test set A Basics: math expressions; variables; integers; booleans
Test set B Conditionals and loops (if and while); scopes
Test set C Type checking; larger and more detailed tests for other features
Test set D Function definitions; function calls; break and continue
Extra Optional tests for advanced features and details. No effect on grade.

Your grade is determined by how much of each test set passes.

Grade Required tests
1 80% of test set A
2 80% of test sets A and B
3 90% of test sets A, B and C
4 100% of test sets A, B and C
5 100% of test sets A, B, C and D

Note that Test Gadget only has end-to-end tests that test all compiler stages at once. You have to be quite far along before any of these tests pass. It’s highly recommended to write your own unit tests for the early compiler stages (tokenizer, parser, IR generator). This is very likely to save you time overall.

Base language

You must implement a tokenizer, parser, type checker, IR generator and Assembly generator for the following language constructs, as instructed by the tasks in chapters 2-7.

See also: detailed spec of the base language.

Break and continue

This feature is only required for grade 5.

Add expressions break and continue to the language. Just like in Python, break must exit the innermost active loop, and continue must go back to its beginning.

Hints:

Functions

This feature is only required for grade 5.

Initially our compiler only compiles a single top-level expression. Make it possible to define new functions as well.

Example:

fun square(x: Int): Int {
    return x * x;
}

fun vec_len_squared(x: Int, y: Int): Int {
    return square(x) + square(y);
}

fun print_int_twice(x: Int) {
    print_int(x);
    print_int(x);
}

print_int_twice(vec_len_squared(3, 4));

Guide:

  1. Currently our language’s top-level syntactic element is one or more top-level expressions. Let’s change it to be a module instead. A module can contain function definitions in addition to the top-level expressions. (In this design, you should not consider modules and function definitions themselves to be expressions).

    Make the following structural changes, but don’t write code to actually do anything with function definitions just yet:

    • Define AST node types for function definitions and modules.
    • Change the parser to emit a module AST node instead of an expression AST node. (For now it can emit a module containing just a top-level expression and no functions.)
    • Change the interpreter, type-checker and IR generator to take a module AST node as input (For now they can ignore any functions and just work on the top-level expression.)
    • Change the IR generator to return a dictionary of function names to instruction lists. You can hard-code ”main” as the name of the top-level expression.
    • Change the Assembly generator to take this dictionary and emit Assembly code for each function. You’ll need to generate unique labels within each function, because the scope of the labels is the whole Assembly file.
  2. In the parser, implement the parsing functions for modules and for function definitions.
  3. In the stages that use a symbol table, put all user-defined functions in the symbol table and treat them similarly to how you’ve treated built-in functions so far.
    • In the interpreter, map each user-defined function to a Python function that runs the interpreter on the user-defined function’s body.
    • In the type-checker, map each user-defined function to a function type based on the type annotations.
    • In the IR and Assembly generators, do the same as with built-in functions.
  4. Make incoming parameters available as variables in all stages that use a symbol table.
  5. Change the type-checker, IR generator and Assembly generator to process function definitions in addition to the top-level expression.
  6. Implement return ... expressions in all compiler stages.
    • A function whose return type is not Unit must execute a return expression. It is unspecified what happens if a non-Unit function fails to execute a return.
    • Test Gadget does not have test cases with such missing return expressions, but it does have cases where the only return occurs within a while true loop.

Also note: