Project
The course project has you build all the parts of a compiler for a simple language.
The tasks in chapters 2-7 guide you in building up most of the compiler. Completing these tasks correctly gives grade 4.
Grade 5 requires a bit of independent work: you need to add break and continue expressions and functions to your language as described below.
Rules
The project’s deadline is March 17, 23:59:59.
You must submit your project to the Test Gadget system. The setup instructions explain how to do this.
You may use your language’s general-purpose standard library features, including the regular expression library. No other libraries or tools meant for building compilers are allowed, e.g. no parser combinators. General-purpose libraries like collection extensions and unit testing libraries are OK. When in doubt, ask. Your compiler may optionally link its outputs with the C standard library.
Your compiler may have reasonable limitations, but you may not directly special-case your code so that it passes specific tests, nor ”game the system” in other ways.
‼️ It is not allowed to use ChatGPT, Copilot and similar AI systems for code generation. You may ask them theory questions if you want, but don’t give them your code to debug or explain, and don’t take any code from them.
The reason is that AI tools would rob you of valuable practice in writing, debugging and understanding code. They also frequently make mistakes, lead you down undesirable paths, and have trouble understanding the big picture.
Grade requirements
Passing the course requires completing all stages of the compiler, i.e. something from each of the chapters 2-7 (chapter 4’s interpreter is optional).
Test Gadget’s tests are split into the following sets:
Test set | Features tested |
---|---|
Test set A | Basics: math expressions; variables; integers; booleans |
Test set B | Conditionals and loops (if and while ); scopes |
Test set C | Type checking; larger and more detailed tests for other features |
Test set D | Function definitions; function calls; break and continue |
Extra | Optional tests for advanced features and details. No effect on grade. |
Your grade is determined by how much of each test set passes.
Grade | Required tests |
---|---|
1 | 80% of test set A |
2 | 80% of test sets A and B |
3 | 90% of test sets A, B and C |
4 | 100% of test sets A, B and C |
5 | 100% of test sets A, B, C and D |
Note that Test Gadget only has end-to-end tests that test all compiler stages at once. You have to be quite far along before any of these tests pass. It’s highly recommended to write your own unit tests for the early compiler stages (tokenizer, parser, IR generator). This is very likely to save you time overall.
Base language
You must implement a tokenizer, parser, type checker, IR generator and Assembly generator for the following language constructs, as instructed by the tasks in chapters 2-7.
- Integer literals
- Boolean literals
- Binary operators
+
,-
,*
,/
,%
,==
,!=
,<
,<=
,>
,>=
,and
,or
with precedence and left-associativity. - Unary operators
-
andnot
- Library functions
print_int
,read_int
andprint_bool
- Blocks of statements
- Local variables with initializers
- Assignment statements (right-associative)
- if-then and if-then-else
- while loops
See also: detailed spec of the base language.
Break and continue
This feature is only required for grade 5.
Add expressions break
and continue
to the language.
Just like in Python, break
must exit the innermost active loop,
and continue
must go back to its beginning.
Hints:
- In the interpreter, it’s probably easiest to implement these by abusing Python exceptions.
- The types of these expressions can be
Unit
. - In IR generation, you need to keep track of the current innermost loop’s start and end labels.
- Extra task (no effect on grade): allow
break
to take an optional result value, and make that the result value of the loop (instead ofUnit
). The type checker must check that everybreak
of the same loop has a result value of the same type.
Functions
This feature is only required for grade 5.
Initially our compiler only compiles a single top-level expression. Make it possible to define new functions as well.
Example:
fun square(x: Int): Int {
return x * x;
}
fun vec_len_squared(x: Int, y: Int): Int {
return square(x) + square(y);
}
fun print_int_twice(x: Int) {
print_int(x);
print_int(x);
}
print_int_twice(vec_len_squared(3, 4));
Guide:
-
Currently our language’s top-level syntactic element is one or more top-level expressions. Let’s change it to be a module instead. A module can contain function definitions in addition to the top-level expressions. (In this design, you should not consider modules and function definitions themselves to be expressions).
Make the following structural changes, but don’t write code to actually do anything with function definitions just yet:
- Define AST node types for function definitions and modules.
- Change the parser to emit a module AST node instead of an expression AST node. (For now it can emit a module containing just a top-level expression and no functions.)
- Change the interpreter, type-checker and IR generator to take a module AST node as input (For now they can ignore any functions and just work on the top-level expression.)
- Change the IR generator to return a dictionary of function names to instruction lists. You can hard-code ”main” as the name of the top-level expression.
- Change the Assembly generator to take this dictionary and emit Assembly code for each function. You’ll need to generate unique labels within each function, because the scope of the labels is the whole Assembly file.
- In the parser, implement the parsing functions for modules and for function definitions.
- In the stages that use a symbol table, put all user-defined functions in the symbol table and
treat them similarly to how you’ve treated built-in functions so far.
- In the interpreter, map each user-defined function to a Python function that runs the interpreter on the user-defined function’s body.
- In the type-checker, map each user-defined function to a function type based on the type annotations.
- In the IR and Assembly generators, do the same as with built-in functions.
- Make incoming parameters available as variables in all stages that use a symbol table.
- Change the type-checker, IR generator and Assembly generator to process function definitions in addition to the top-level expression.
- Implement
return ...
expressions in all compiler stages.- A function whose return type is not
Unit
must execute areturn
expression. It is unspecified what happens if a non-Unit
function fails to execute areturn
. - Test Gadget does not have test cases with such missing
return
expressions, but it does have cases where the onlyreturn
occurs within awhile true
loop.
- A function whose return type is not
Also note:
- Function calls and function return values must be correctly type-checked.
- Recursion and mutual recursion (
f
callsg
,g
callsf
) must work. - It is not required to make variables declared in top-level expressions available to functions.