Compilers

spring 2026

Language spec

This is a specification for the base language that we build in chapters 2-7. This page is meant as compact reference, not as an introduction.

This specification only covers the project’s base language.

Example

The language we’re going to build looks like this:

var n: Int = read_int();
print_int(n);
while n > 1 do {
    if n % 2 == 0 then {
        n = n / 2;
    } else {
        n = 3*n + 1;
    }
    print_int(n);
}

Syntax

This specifies the structure of a syntactically valid program in our language. In other words, this specifies how the parser should work.

An expression is defined recursively as follows, where E, E1, E2, … En represent some other arbitrary expression.

Variable declarations (var ...) are allowed only directly inside blocks ({ ... }) and in top-level expressions.

Type expressions can be:

Precedences:

  1. =
  2. or
  3. and,
  4. ==, !=
  5. <, <=, >, >=
  6. +, -
  7. *, /, %
  8. Unary - and not
  9. All other constructs: literals, identifiers, if, while, var, blocks, parentheses, function calls.

All non-operator expressions such as if, while and function calls must be (syntactically) allowed to be part of other expressions, so e.g. 1 + if true then 2 else 3 must be allowed.

The program consists of a single top-level expression. If the program text has multiple expressions separated by semicolons, they are treated like the contents of a block, and that block becomes the top-level expression. The last expression may be optionally followed by a semicolon.

Arbitrary amounts of whitespace are allowed between tokens. One-line comments starting with # or // are supported.

Semantics

This specifies specify how a program should work.

This works directly as a specification for an interpreter. A compiler should produce a program that behaves the same way.

Type-checking is part of the language, but not formalized here.

A value may be one of the following:

* You don’t need to support any use of function values other than calling them directly.

A context consists of:

A lookup of identifier ID in context C proceeds as follows:

An expression is evaluated with a given context. The result of expression evaluation is a value and optionally a modification of the context’s locals.

An expression in a context C is evaluated as follows.

When evaluating the top-level expression, the initial context has no parent context and has the built-in functions and operators as its locals.

The built-in functions are:

The output of a program whose evaluation succeeds consists of the outputs of any built-in print functions that were evaluated, followed by the result of its top-level expression, if it was an integer or a boolean. (If the top-level expression ends in a semicolon, then the result is unit, which is not printed.)