Language spec
This is a specification for the base language that we build in chapters 2-7. This page is meant as compact reference, not as an introduction.
This specification does not include any of the additional features specified in the project.
Example
The language we’re going to build looks like this:
var n: Int = read_int();
print_int(n);
while n > 1 do {
if n % 2 == 0 then {
n = n / 2;
} else {
n = 3*n + 1;
}
print_int(n);
}
Syntax
This specifies the structure of a syntactically valid program in our language. In other words, this specifies how the parser should work.
An expression is defined recursively as follows,
where E
, E1
, E2
, … En
represent some other arbitrary expression.
- Integer literal: a whole number between -263 and 263 - 1.
- Boolean literal: either
true
orfalse
. - Identifier: a word consisting of letters, underscores or digits, but the first character must not be a digit.
- Unary operator: either
-E
ornot E
. - Binary operator:
E1 op E2
whereop
is one of the following:+
,-
,*
,/
,%
,==
,!=
,<
,<=
,>
,>=
,and
,or
,=
.- Operator
=
is right-associative. - All other operators are left-associative.
- Precedences are defined below.
- Operator
- Parentheses:
(E)
, used to override precedence. - Block:
{ E1; E2; ...; En }
or{ E1; E2; ...; En; }
(may be empty, last semicolon optional).- Semicolons after subexpressions that end in
}
are optional.
- Semicolons after subexpressions that end in
- Untyped variable declaration:
var ID = E
whereID
is an identifier. - Typed variable declaration:
var ID: T = E
whereID
is an identifier andT
is a type expression (defined below). - If-then conditional:
if E1 then E2
- If-then-else conditional:
if E1 then E2 else E3
- While-loop:
while E1 do E2
- Function call:
ID(E1, E2, ..., En)
whereID
is an identifier
Variable declarations (var ...
) are allowed only directly inside blocks ({ ... }
)
and in top-level expressions.
Precedences:
=
or
and
,==
,!=
<
,<=
,>
,>=
+
,-
*
,/
,%
- Unary
-
andnot
- All other constructs: literals, identifiers,
if
,while
,var
, blocks, parentheses, function calls.
All non-operator expressions such as if
, while
and function calls must be (syntactically)
allowed to be part of other expressions, so e.g. 1 + if true then 2 else 3
must be allowed.
A type expression is defined recursively as follows,
where T
, T1
, T2
, … Tn
represent some other arbitrary type expression.
- Type variable: an identifier
- Function type expression:
(T1, T2, ..., Tn) => T
whereT1, T2, ..., Tn
are the parameter types andT
is the result type.
The program consists of a single top-level expression. If the program text has multiple expressions separated by semicolons, they are treated like the contents of a block, and that block becomes the top-level expression. The last expression may be optionally followed by a semicolon.
Arbitrary amounts of whitespace are allowed between tokens.
One-line comments starting with #
or //
are supported.
Semantics
This specifies specify how a program should work.
This works directly as a specification for an interpreter. A compiler should produce a program that behaves the same way.
We don’t specify type-checking here.
A value may be one of the following:
- a 64-bit signed integer (between -263 and 263 - 1)
- a boolean
true
orfalse
- a built-in function*
- the special value
unit
, which means ”no meaningful value”
* You don’t need to support any use of function values other than calling them directly.
A context consists of:
- locals: a partial map of identifiers to values
- an optional parent context
A lookup of identifier ID
in context C
proceeds as follows:
- if
ID
is defined inC
’s locals, return the corresponding value - otherwise if a lookup of
ID
inC
’s parent succeeds, return that value - otherwise the lookup fails
An expression is evaluated with a given context. The result of expression evaluation is a value and optionally a modification of the context’s locals.
An expression in a context C
is evaluated as follows.
- Literal: return the constant value indicated.
- Identifier
ID
: look upID
fromC
, fail if not found. - Unary operator
-E
:- if the value of
E
is an integer, return its negation - otherwise fail
- if the value of
- Unary operator
not E
:- if the value of
E
is a boolean, return its negation - otherwise fail
- if the value of
- Binary operator
E1 = E2
:- if
E1
is an identifierID
, then:- define context
C'
like this:- if
ID
is a local ofC
, defineC'
asC
- otherwise define
C'
as the closest parent context ofC
that hasID
as a local - if
C'
is still not defined, fail
- if
- set local
ID
inC'
to the value ofE2
- define context
- if
E1
is not an identifier, fail - return the value of
E2
- if
- Binary operator
E1 and E2
:- if
E1
evaluates to false, return false and do not evaluateE2
- otherwise return the value of
E2
- if
- Binary operator
E1 or E2
:- if
E1
evaluates to true, return true and do not evaluateE2
- otherwise return the value of
E2
- if
- Other binary operators
E1 OP E2
:- look up the value
f
ofOP
inC
- if
f
is not a function, fail - evaluate
E1
and thenE2
- call
f
with the values ofE1
andE2
and return the result
- look up the value
- Block
{ E1; E2; ...; En }
or{ E1; E2; ...; En; }
:- create context
C'
with no locals and parent contextC
- evaluate each expression
E1
,E2
, … in contextC'
- if
En
is not followed by a semicolon, return the value ofEn
- if
En
is followed by a semicolon, returnunit
- an empty block (
{}
) is allowed – it returnsunit
- create context
- Variable declaration
var ID = E
:- if
C
already has a localID
, fail - set local
ID
inC
to the value ofE
- return value
unit
- if
- If-then conditional
if E1 then E2
:- if the value of
E1
istrue
, evaluateE2
and returnunit
- if the value of
E1
isfalse
, returnunit
- if the value of
E1
is something else, fail - return
unit
- if the value of
- If-then-else conditional
if E1 then E2 else E3
:- if the value of
E1
istrue
, return the value ofE2
- if the value of
E1
isfalse
, return the value ofE3
- if the value of
E1
is something else, fail
- if the value of
- While-loop
while E1 do E2
:- if the value of
E1
is true, evaluateE2
, discard its return value, and then start evaluating the while-loop again - if the value of
E1
is false, returnunit
- if the value of
E1
is something else, fail
- if the value of
- Function call
E(E1, E2, ..., En)
:- let
f
be the value ofE
- if
f
is not a function, fail - evaluate each expression
E1
,E2
, …, callf
with their values, and return the resutl - the implementation is allowed to limit the number of allowed arguments to 6
- let
When evaluating the top-level expression, the initial context has no parent context and has the built-in functions and operators as its locals.
The built-in functions are:
print_int
: prints an integer and a newline to standard output.print_bool
: prints eithertrue
orfalse
and a newline to standard output.read_int
: reads a single line, including the newline, from standard input, and interprets it as an integer. If the input before the newline contains characters other than digits and a prefix minus, thenread_int
is allowed to fail or have arbitrary behavior.
The output of a program whose evaluation succeeds
consists of the outputs of any built-in print functions that were evaluated,
followed by the result of its top-level expression, if it was an integer or a boolean.
(If the top-level expression ends in a semicolon, then the result is unit
, which is not printed.)