Language spec
This is a specification for the base language that we build in chapters 2-7. This page is meant as compact reference, not as an introduction.
This specification only covers the project’s base language.
Example
The language we’re going to build looks like this:
var n: Int = read_int();
print_int(n);
while n > 1 do {
if n % 2 == 0 then {
n = n / 2;
} else {
n = 3*n + 1;
}
print_int(n);
}
Syntax
This specifies the structure of a syntactically valid program in our language. In other words, this specifies how the parser should work.
An expression is defined recursively as follows,
where E, E1, E2, … En represent some other arbitrary expression.
- Integer literal: a positive whole number.
- Negative numbers should be composed of token
-followed by an integer literal token.
- Negative numbers should be composed of token
- Boolean literal: either
trueorfalse. - Identifier: a word consisting of letters, underscores or digits, but the first character must not be a digit.
- Unary operator: either
-Eornot E. - Binary operator:
E1 op E2whereopis one of the following:+,-,*,/,%,==,!=,<,<=,>,>=,and,or,=.- Operator
=is right-associative. - All other operators are left-associative.
- Precedences are defined below.
- Operator
- Parentheses:
(E), used to override precedence. - Block:
{ E1; E2; ...; En }or{ E1; E2; ...; En; }(may be empty, last semicolon optional).- Semicolons after subexpressions that end in
}are optional.
- Semicolons after subexpressions that end in
- Untyped variable declaration:
var ID = EwhereIDis an identifier. - Typed variable declaration:
var ID: T = EwhereIDis an identifier andTis a type expression (defined below). - Conditional:
if E1 then E2orif E1 then E2 else E3.- Nested conditionals take
E2to be as long as possible, meaningif a then if b then c else dis parsed asif a then (if b then c else d).
- Nested conditionals take
- While-loop:
while E1 do E2. - Function call:
ID(E1, E2, ..., En)whereIDis an identifier.
Variable declarations (var ...) are allowed only directly inside blocks ({ ... })
and in top-level expressions.
Type expressions can be:
- Primitive types:
Int,BoolorUnit. - Function types:
(T1, T2, ...) => TwhereT1, T2, ...(0 or more) andTare other type expressions.
Precedences:
=orand,==,!=<,<=,>,>=+,-*,/,%- Unary
-andnot - All other constructs: literals, identifiers,
if,while,var, blocks, parentheses, function calls.
All non-operator expressions such as if, while and function calls must be (syntactically)
allowed to be part of other expressions, so e.g. 1 + if true then 2 else 3 must be allowed.
The program consists of a single top-level expression. If the program text has multiple expressions separated by semicolons, they are treated like the contents of a block, and that block becomes the top-level expression. The last expression may be optionally followed by a semicolon.
Arbitrary amounts of whitespace are allowed between tokens.
One-line comments starting with # or // are supported.
Semantics
This specifies specify how a program should work.
This works directly as a specification for an interpreter. A compiler should produce a program that behaves the same way.
Type-checking is part of the language, but not formalized here.
A value may be one of the following:
- a 64-bit signed integer (between -263 and 263 - 1)
- a boolean
trueorfalse - a built-in function*
- the special value
unit, which means ”no meaningful value”
* You don’t need to support any use of function values other than calling them directly.
A context consists of:
- locals: a partial map of identifiers to values
- an optional parent context
A lookup of identifier ID in context C proceeds as follows:
- if
IDis defined inC’s locals, return the corresponding value - otherwise if a lookup of
IDinC’s parent succeeds, return that value - otherwise the lookup fails
An expression is evaluated with a given context. The result of expression evaluation is a value and optionally a modification of the context’s locals.
An expression in a context C is evaluated as follows.
- Literal: return the constant value indicated.
- Identifier
ID: look upIDfromC, fail if not found. - Unary operator
-E:- if the value of
Eis an integer, return its negation - otherwise fail
- if the value of
- Unary operator
not E:- if the value of
Eis a boolean, return its negation - otherwise fail
- if the value of
- Binary operator
E1 = E2:- if
E1is an identifierID, then:- define context
C'like this:- if
IDis a local ofC, defineC'asC - otherwise define
C'as the closest parent context ofCthat hasIDas a local - if
C'is still not defined, fail
- if
- set local
IDinC'to the value ofE2
- define context
- if
E1is not an identifier, fail - return the value of
E2
- if
- Binary operator
E1 and E2:- if
E1evaluates to false, return false and do not evaluateE2 - otherwise return the value of
E2
- if
- Binary operator
E1 or E2:- if
E1evaluates to true, return true and do not evaluateE2 - otherwise return the value of
E2
- if
- Other binary operators
E1 OP E2:- look up the value
fofOPinC - if
fis not a function, fail - evaluate
E1and thenE2 - call
fwith the values ofE1andE2and return the result
- look up the value
- Block
{ E1; E2; ...; En }or{ E1; E2; ...; En; }:- create context
C'with no locals and parent contextC - evaluate each expression
E1,E2, … in contextC' - if
Enis not followed by a semicolon, return the value ofEn - if
Enis followed by a semicolon, returnunit - an empty block (
{}) is allowed – it returnsunit
- create context
- Variable declaration
var ID = E:- if
Calready has a localID, fail - set local
IDinCto the value ofE - return value
unit
- if
- If-then conditional
if E1 then E2:- if the value of
E1istrue, evaluateE2 - if the value of
E1is neithertruenorfalse, fail - return
unit
- if the value of
- If-then-else conditional
if E1 then E2 else E3:- if the value of
E1istrue, return the value ofE2 - if the value of
E1isfalse, return the value ofE3 - if the value of
E1is something else, fail
- if the value of
- While-loop
while E1 do E2:- if the value of
E1is true, evaluateE2, discard its return value, and then start evaluating the while-loop again - if the value of
E1is false, returnunit - if the value of
E1is something else, fail
- if the value of
- Function call
E(E1, E2, ..., En):- let
fbe the value ofE - if
fis not a function, fail - evaluate each expression
E1,E2, …, callfwith their values, and return the resutl - the implementation is allowed to limit the number of allowed arguments to 6
- let
When evaluating the top-level expression, the initial context has no parent context and has the built-in functions and operators as its locals.
The built-in functions are:
print_int: prints an integer and a newline to standard output.print_bool: prints eithertrueorfalseand a newline to standard output.read_int: reads a single line, including the newline, from standard input, and interprets it as an integer. If the input before the newline contains characters other than digits and a prefix minus, thenread_intis allowed to fail or have arbitrary behavior.
The output of a program whose evaluation succeeds
consists of the outputs of any built-in print functions that were evaluated,
followed by the result of its top-level expression, if it was an integer or a boolean.
(If the top-level expression ends in a semicolon, then the result is unit, which is not printed.)