Language spec
This is a specification for the base language that we build in chapters 2-7. This page is meant as compact reference, not as an introduction.
This specification does not include any of the additional features specified in the project.
Example
The language we’re going to build looks like this:
var n: Int = read_int();
print_int(n);
while n > 1 do {
if n % 2 == 0 then {
n = n / 2;
} else {
n = 3*n + 1;
}
print_int(n);
}
Syntax
(Most relevant for chapter 3)
This specifies the structure of a valid program in our language.
An expression is defined recursively as follows,
where E, E1, E2, … En represent some other arbitrary expression.
- Integer literal: a whole number between -264 and 263 - 1.
- Boolean literal: either
trueorfalse. - Identifier: a word consisting of letters, underscores or digits, but the first character must not be a digit.
- Unary operator: either
-Eornot E. - Binary operator:
E1 op E2whereopis one of the following:+,-,*,/,%,==,!=,<,<=,>,>=,and,or,=.- Operator
=is right-associative. - All other operators are left-associative.
- Precedences are defined below.
- Operator
- Parentheses:
(E), used to override precedence. - Block:
{ E1; E2; ...; En }or{ E1; E2; ...; En; }(may be empty, last semicolon optional).- Semicolons after subexpressions that end in
}are optional.
- Semicolons after subexpressions that end in
- Untyped variable declaration:
var ID = EwhereIDis an identifier. - Typed variable declaration:
var ID: T = EwhereIDis an identifier andTis a type expression (defined below). - If-then conditional:
if E1 then E2 - If-then-else conditional:
if E1 then E2 else E3 - While-loop:
while E1 do E2 - Function call:
ID(E1, E2, ..., En)whereIDis an identifier
Variable declarations (var ...) are allowed only directly inside blocks ({ ... })
and in top-level expressions.
Precedences:
=orand,==,!=<,<=,>,>=+,-*,/,%- Unary
-andnot - All other constructs: literals, identifiers,
if,while,var, blocks, parentheses, function calls.
All non-operator expressions such as if, while and function calls must be (syntactically)
allowed to be part of other expressions, so e.g. 1 + if true then 2 else 3 must be allowed.
A type expression is defined recursively as follows,
where T, T1, T2, … Tn represent some other arbitrary type expression.
- Type variable: an identifier
- Function type expression:
(T1, T2, ..., Tn) => TwhereT1, T2, ..., Tnare the parameter types andTis the result type.
The program consists of a single expression, called the top-level expression.
If multiple expressions are given, they are treated like a block without { and }.
Arbitrary amounts of whitespace are allowed between tokens.
One-line comments starting with # or // are supported.
Semantics
(Most relevant for chapter 4)
This specifies specify how a program should work.
This works directly as a specification for an interpreter. A compiler should produce a program that behaves the same way.
We don’t specify type-checking here.
A value may be one of the following:
- a 64-bit signed integer (between -264 and 263 - 1)
- a boolean
trueorfalse - a built-in function*
- the special value
unit, which means ”no meaningful value”
* You don’t need to support any use of function values other than calling them directly.
A context consists of:
- locals: a partial map of identifiers to values
- an optional parent context
A lookup of identifier ID in context C proceeds as follows:
- if
IDis defined inC’s locals, return the corresponding value - otherwise if a lookup of
IDinC’s parent succeeds, return that value - otherwise the lookup fails
An expression is evaluated with a given context. The result of expression evaluation is a value and optionally a modification of the context’s locals.
An expression in a context C is evaluated as follows.
- Literal: return the constant value indicated.
- Identifier
ID: look upIDfromC, fail if not found. - Unary operator
-E:- if the value of
Eis an integer, return its negation - otherwise fail
- if the value of
- Unary operator
not E:- if the value of
Eis a boolean, return its negation - otherwise fail
- if the value of
- Binary operator
E1 = E2:- if
E1is an identifierID, then:- define context
C'like this:- if
IDis a local ofC, defineC'asC - otherwise define
C'as the closest parent context ofCthat hasIDas a local - if
C'is still not defined, fail
- if
- set local
IDinC'to the value ofE2
- define context
- if
E1is not an identifier, fail - return the value of
E2
- if
- Binary operator
E1 and E2:- if
E1evaluates to false, return false and do not evaluateE2 - otherwise return the value of
E2
- if
- Binary operator
E1 or E2:- if
E1evaluates to true, return true and do not evaluateE2 - otherwise return the value of
E2
- if
- Other binary operators
E1 OP E2:- look up the value
fofOPinC - if
fis not a function, fail - evaluate
E1and thenE2 - call
fwith the values ofE1andE2and return the result
- look up the value
- Block
{ E1; E2; ...; En }or{ E1; E2; ...; En; }:- create context
C'with no locals and parent contextC - evaluate each expression
E1,E2, … in contextC' - if
Enis not followed by a semicolon, return the value ofEn - if
Enis followed by a semicolon, returnunit - an empty block (
{}) is allowed – it returnsunit
- create context
- Variable declaration
var ID = E:- if
Calready has a localID, fail - set local
IDinCto the value ofE - return value
unit
- if
- If-then conditional
if E1 then E2:- if the value of
E1istrue, evaluateE2and returnunit - if the value of
E1isfalse, returnunit - if the value of
E1is something else, fail - return
unit
- if the value of
- If-then-else conditional
if E1 then E2 else E3:- if the value of
E1istrue, return the value ofE2 - if the value of
E1isfalse, return the value ofE3 - if the value of
E1is something else, fail
- if the value of
- While-loop
while E1 do E2:- if the value of
E1is true, evaluateE2, discard its return value, and then start evaluating the while-loop again - if the value of
E1is false, returnunit - if the value of
E1is something else, fail
- if the value of
- Function call
E(E1, E2, ..., En):- let
fbe the value ofE - if
fis not a function, fail - evaluate each expression
E1,E2, …, callfwith their values, and return the resutl - the implementation is allowed to limit the number of allowed arguments to 6
- let
When evaluating the top-level expression, the initial context has no parent context and has the built-in functions and operators as its locals.
The built-in functions are:
print_int: prints an integer and a newline to standard output.print_bool: prints eithertrueorfalseand a newline to standard output.read_int: reads a single line, including the newline, from standard input, and interprets it as an integer. If the input before the newline contains characters other than digits and a prefix minus, thenread_intis allowed to fail or have arbitrary behavior.
The output of a program whose evaluation succeeds consists of the outputs of any built-in print functions that were evaluated, followed by the result of its top-level expression, if it was an integer or a boolean.