Compilers

spring 2025

Language spec

This is a specification for the base language that we build in chapters 2-7. This page is meant as compact reference, not as an introduction.

This specification does not include any of the additional features specified in the project.

Example

The language we’re going to build looks like this:

var n: Int = read_int();
print_int(n);
while n > 1 do {
    if n % 2 == 0 then {
        n = n / 2;
    } else {
        n = 3*n + 1;
    }
    print_int(n);
}

Syntax

This specifies the structure of a syntactically valid program in our language. In other words, this specifies how the parser should work.

An expression is defined recursively as follows, where E, E1, E2, … En represent some other arbitrary expression.

Variable declarations (var ...) are allowed only directly inside blocks ({ ... }) and in top-level expressions.

Precedences:

  1. =
  2. or
  3. and,
  4. ==, !=
  5. <, <=, >, >=
  6. +, -
  7. *, /, %
  8. Unary - and not
  9. All other constructs: literals, identifiers, if, while, var, blocks, parentheses, function calls.

All non-operator expressions such as if, while and function calls must be (syntactically) allowed to be part of other expressions, so e.g. 1 + if true then 2 else 3 must be allowed.

A type expression is defined recursively as follows, where T, T1, T2, … Tn represent some other arbitrary type expression.

The program consists of a single top-level expression. If the program text has multiple expressions separated by semicolons, they are treated like the contents of a block, and that block becomes the top-level expression. The last expression may be optionally followed by a semicolon.

Arbitrary amounts of whitespace are allowed between tokens. One-line comments starting with # or // are supported.

Semantics

This specifies specify how a program should work.

This works directly as a specification for an interpreter. A compiler should produce a program that behaves the same way.

We don’t specify type-checking here.

A value may be one of the following:

* You don’t need to support any use of function values other than calling them directly.

A context consists of:

A lookup of identifier ID in context C proceeds as follows:

An expression is evaluated with a given context. The result of expression evaluation is a value and optionally a modification of the context’s locals.

An expression in a context C is evaluated as follows.

When evaluating the top-level expression, the initial context has no parent context and has the built-in functions and operators as its locals.

The built-in functions are:

The output of a program whose evaluation succeeds consists of the outputs of any built-in print functions that were evaluated, followed by the result of its top-level expression, if it was an integer or a boolean. (If the top-level expression ends in a semicolon, then the result is unit, which is not printed.)