7. Assembly generator

The final step in our compiler is to turn IR code into Assembly language, which an assembler can translate to machine code that the computer can execute.

For serious production languages, it’s usually a bad idea to write an Assembly generator ourselves, because it’s difficult to do that well and there exist many high-quality off-the-shelf compiler backends. We will write a simple Assembly generator here, because it’s valuable to have a clear picture of the task and the problems involved.

The compiler stages that translate source code to IR are sometimes called the compiler’s frontend, and the stages that translate IR to machine code are the backend. Optimization stages can reside in either, or both.

Writing a good compiler backend is a ton of work. Fortunately, many good open-source backends are available.

LLVM is a very popular compiler backend, used by the Clang C/C++ compiler, Rust, and many others. Its input is a low-level IR that is easier to generate than Assembly, and it produces well-optimized machine code for a variety of platforms.

WebAssembly is similar, except the IR is simpler and more restricted, and the resulting program is typically compiled and run by a web browser. Its use with command-line sandboxes is also gaining popularity.

Other options include compiling to the IRs of other languages, e.g. to Java byte code or to .NET CIL.

It’s also possible to make your compiler generate another source language, typically C, C++ or JavaScript. Such compilers are sometimes called transpilers.

Despite all these good options, sometimes writing your own Assembly generator can be reasonable. This can be the case e.g. if compilation speed is a very high priority, as it is with Go and many just-in-time (JIT) compilers, such as your browser’s JavaScript runtime.

While existing compiler backends can be very useful, they can also hold back the potential of some novel language features. A famous example of this is that Rust could not enable certain LLVM optimizations for a long time due to subtle LLVM bugs that were never exposed when compiling other languages like C/C++, which couldn’t guarantee LLVM the same freedom to optimize as Rust could.

First we’ll learn the basics of Assembly code and then we’ll see how to structure the Assembly generator. Since this is a compiler course and not an Assembly language course, you don’t need to become proficient with Assembly language.

Introduction to Assembly code

Different computers understand different machine instructions (they have different ”instruction sets”). We will be compiling for the x86-64 family of processors, since they are currently most common on Windows and Linux PCs and servers. Almost all mobile devices, as well as recent Apple computers and some cloud servers, use ARM processors instead. We’d have to write a separate Assembly generator for them if we wanted to target them too, but the principles remain the same.

A processor has a fixed number of fixed-size extremely fast memory locations called registers. In x86-64, the so-called ”general-purpose registers” are (for historical reasons) named: %rax, %rbx, %rcx, %rdx, %rsi, %rdi, %r8, %r9, …, %r15. These are sized 8 bytes i.e. 64-bits each.

General purpose registers can (mostly) be used freely by the program. There are other registers with special behaviour or usage conventions too. We’ll look at some of them later.

Machine instructions perform computations on values held in registers. We will write machine instructions in a language called Assembly code. Each line of Assembly code (usually) corresponds to one machine instruction.

Examples of x86-64 Assembly code lines:

The instruction addq %rbx, %rax adds two 8-byte values in registers %rbx and %rax, and stores the result in register %rax.
The instruction negq %rcx negates the 8-byte value in register %rcx and stores the result in register %rcx.

Computers can’t read Assembly code directly. They require machine instructions to be in a binary form called machine code. Assembly code is translated into machine code by a program called the assembler.

Main memory

Data that does not fit in a computer’s physical registers can be stored in main memory. Main memory can be thought of as an enormous list of 1-byte (8-bit) memory slots, each having a sequence number (0, 1, 2, …) called an address. Variables or registers that hold memory addresses are often called pointers.

For instance, if we store some 8-byte (64-bit) integer value at memory address 200, it would occupy 8 memory slots with addresses 200 to 207. Then a pointer holding the value 200 could be used to read the value from memory, or to overwrite it with a new value.

This is a simplification, but more than good enough for our purposes.

This simple ”contiguous list of bytes” model of memory is a convenient and solid abstraction for application programmers, but there is more going on under the hood.

Physical vs virtual memory

If we overlook certain details in how the electronics of a computer are built, a modern computer’s main memory really is addressed as a contiguous list of 8-bit memory slots. This is called physical memory, but this alone is not enough to safely run programs on multi-process operating systems.

Two programs might both be written to use e.g. memory location 200. We’d like the operating system to be able to run both of them at the same time without them overwriting each others’ memory. The (usual) solution is that each program gets its own virtual memory space. The same slice of virtual memory, e.g. the range 200..400, can be mapped to separate slices of physical memory by the processor. That is, one program’s virtual address range 200..400 might map to e.g. physical address range 900..1100 while the other program’s virtual address 200..400 might map to physical address 1500..1700.

There are alternatives. You could e.g. imagine an operating system that requires programs to be ”position-independent” and take their allotted slice of memory as a startup parameter. This is possible, but virtual memory has other benefits and so it’s usually considered a worthwhile feature, except in very simple single-program computers (usually called ”microcontrollers”).

Memory hierarchy

Another way in which the ”contiguous list of bytes” model of memory is not exactly true is that a computers almost always have a memory hierarchy: some slices of memory are faster to access than others.

Some amount of memory (typically recently used memory) can be cached in the processor’s L1, L2 and/or L3 cache, making it orders of magnitude faster to access than main memory. Caching happens mostly automatically, though good code optimization sometimes requires understanding the details. (Modern graphics processors (GPUs) give programmers direct access to small but fast per-processor and per-processor-group memory, but CPUs do not.)

Even ignoring caches, all main memory is not always equally fast to access. Large multiprocessor systems may implement non-uniform memory access (NUMA) where one processor has direct (fast) access to one slice of memory, while access to the rest must go through other processors. This is a complex topic that becomes relevant only when writing multi-threaded programs for large multi-processor systems.

Lastly, programs may use more virtual memory than there is physical memory on the machine. The operating system can then free some memory by moving data from physical memory to a storage device. The data is reloaded only when the program that owned it next accesses the corresponding virtual memory. This is called swapping, and it can be thought of as the last (and slowest) layer of a computer’s memory hierarchy.

In short, the memory hierarchy on most modern systems consists of:

Per-CPU core: registers, then L1 cache, then L2 cache
Per-CPU: L3 cache
Per-machine: main memory, then storage device

Each level tends to be very roughly ~10-100x larger than the previous one. E.g. a AMD Ryzen 5900X CPU has a few kB of various registers, 384kB of L1, 6MB of L2, 64MB of L3, and it’s typically paired with many GB of main memory and many TB of storage.

Acccessing a higher level is roughly ~2-100x slower than the previous level, but higher level memory tends to cost significantly less per byte than lower levels. There are also physical limits, such as how much fast L1 cache physically fits close enough to the CPU’s computation units to remain fast.

The movq machine instruction copies 8 bytes (64 bits) of data between registers and main memory. Putting a register in parentheses, e.g. (%rax), refers to the memory location pointed to by the register. Examples:

movq %rax, %rbx copies 8 bytes from register %rax to register %rbx.
movq (%rax), %rbx copies 8 bytes from the memory slots starting from the address held in register %rax, into register %rbx.
movq %rax, (%rbx) copies 8 bytes from the register %rax into the memory slots starting from the address held in register %rbx.
movq %rax, 128(%rbx) copies 8 bytes from the register %rax into the memory slots starting from: the address held in register %rbx plus 128.
(movq cannot move directly from one memory location to another)

Exercise (optional)

Write machine instructions that compute the sum of the values in registers %rax, %rbx and %rcx, and store the result into a memory location 48 bytes ahead of where %rdx points.

Running Assembly code

Let’s see how to run Assembly code on an x86-64 Linux computer (or virtual machine).

Put the following code into a file asmprogram.s. Don’t worry about understanding the details just yet.

        # Metadata for debuggers and other tools
        .global main
        .type main, @function
        .extern printf
        .extern scanf

        .section .text  # Begins code and data
# Label that marks beginning of main function
main:
        # Function stack setup
        pushq %rbp
        movq %rsp, %rbp
        subq $128, %rsp  # Reserve 128 bytes of stack space

        # Read an integer into -8(%rbp)
        movq $scan_format, %rdi  # 1st param: "%ld"
        leaq -8(%rbp), %rsi      # 2nd param: (%rbp - 8)
        call scanf
        # If the input was invalid, jump to end
        cmpq $1, %rax
        jne .Lend

        # *** REPLACE THESE TWO LINES WITH YOUR CODE ***
        movq -8(%rbp), %rsi  # Copy input number to `%rsi`
        imulq $2, %rsi  # Double the input

        # Call function 'printf("%ld\n", %rsi)'
        # to print the number in %rsi.
        movq $print_format, %rdi
        call printf

# Labels starting with ".L" are local to this function,
# i.e. another function than "main" could have its own ".Lend".
.Lend:
        # Return from main with status code 0
        movq $0, %rax
        movq %rbp, %rsp
        popq %rbp
        ret

# String data that we pass to functions 'scanf' and 'printf'
scan_format:
        .asciz "%ld"
print_format:
        .asciz "%ld\n"

Now use the following commands to compile and run the Assembly program:

gcc -g -no-pie -o asmprogram asmprogram.s

./asmprogram

This command uses the GCC compiler collection to compile the source file asmprogram.s into the final program asmprogram, with debug information (-g) and without position-independence (-no-pie, not important to understand yet).

GCC (by default) links our program with the C standard library. That’s good, since we use the C function printf in our program. Later we’ll make our programs independent of the C standard library and run the Assembler directly without needing GCC.

Exercise (optional)

Modify asmprogram.s to read two integers instead of one and to compute the sum of their squares.

Functions and the stack

When we run out of register space, we need to store data in main memory. Our language so far is simple enough that we could give each IR variable a fixed spot in memory, effectively implementing all IR variables as global variables. That would work, but it would break as soon as we add potentially recursive functions to our language: a second instantiation of a function would overwrite the data of the previous instantiation.

The conventional solution is to place a function’s local variables on the stack, which is an area of memory reserved for function locals. When a function starts, it grows the stack enough to fit its local variables. Before a function returns, it restores the stack to its original size. The stack is also sometimes used to store function parameters when calling other functions.

A calling convention specifies how functions should use registers and stack space. To anticipate adding functions to our language, and to make our machine code easier to analyze with a debugger, we will now see how to generate Assembly that is compliant with the System V 64-bit calling convention, which is what Linux uses.

A function call’s stack frame means the range of stack addresses used by that instantiation of the function. When a function is called, the stack pointer register %rsp points to the beginning of the stack frame.

The following are stored just before the stack frame:

any function parameters that didn’t fit into registers
the address of the code to return to when the function is done

Beyond these, a function is free to move %rsp to grow its stack frame in order to store any local variables that don’t fit in registers, or to store function parameters and return addresses when calling other functions.

On most systems, including x86-64 processors, the convention is for the stack to grow downwards, i.e. growing the stack means subtracting from the stack pointer %rsp.

Defining functions

Let’s now look at an example of the conventional way to write functions on Linux. We’ll go through the instructions one by one.

        pushq  %rbp
        movq   %rsp, %rbp
        subq   $32, %rsp    # reserve 32 bytes for locals

        # ... function implementation goes here ...

        movq   %rbp, %rsp
        popq   %rbp
        ret

When the function is entered, before the first instruction is executed, the stack looks like this:

( ... caller's stuff ... )  <-- %rbp points somewhere here
[ arguments that don't fit in registers (if any) ]
[ return address ]          <-- %rsp points here

Register %rbp is conventionally the stack base pointer. It’s used to point to the start of the current function’s stack. On x86-64, at the start of a function call, %rbp still points to the caller’s stack base, so the first thing to do is to fix this.

Our first instruction pushq %rbp stores the value of the stack base pointer on the stack, because we intend to write to %rbp next, and the calling convention says we must restore %rbp to its original value before returning from a function.

The stack now looks like this:

( ... caller's stuff ... )  <-- %rbp still points somewhere here
[ arguments ]
[ return address ]
[ caller's %rbp ]           <-- %rsp points here

The 2nd instruction movq %rsp, %rbp copies %rsp to %rbp. Now %rbp points to the base of our stack, and we’re free to change %rsp as needed (e.g. to push function parameters before calling another function).

Having %rbp always point to the base of the stack frame even as %rsp may move around is useful for two reasons:

We can now conveniently refer to locals and arguments using addresses relative to (%rbp), e.g. the first local will be at -8(%rbp) and the first argument that didn’t fit in registers will be at 16(%rbp).
Debuggers can display the call stack correctly when %rbp always points to the return address.

( ... caller's stuff ... )
[ arguments ]
[ return address ]
[ caller's %rbp ]          <-- %rsp and %rbp both point here

The 3rd instruction subq $32, %rsp reserves space on the stack by moving %rsp downwards by that amount. The required space (32 bytes in this example) is determined by the compiler based on the number of local variables.

( ... caller's stuff ... )
[ arguments ]
[ return address ]
[ caller's %rbp ]        <-- %rbp points here
[ space for variables ]  <-- %rsp points to lowest byte of this

Now the function implementation is free to:

push and pop more things on the stack, e.g. in order to call other functions,
use expressions like -8(%rbp) and -16(%rbp) to access local variables,
use expressions like 16(%rbp) and 24(%rbp) to access arguments that didn’t fit in registers.

When we want to return from the function, the instructions movq %rbp, %rsp and popq %rbp undo what we did at the beginning: they restore %rsp and %rbp to how they were initially.

( ... caller's stuff ... )  <-- %rbp points somewhere here again
[ arguments ]
[ return address ]          <-- %rsp points here again

Finally, the instruction ret returns to the calling function by popping the return address from the stack and continuing execution from there. The function must place its return value, if any, in register %rax before invoking ret.

Calling functions

Let’s now look at what this looks like on the caller’s side.

The Linux x86-64 calling convention specifies that 64-bit integer parameters are passed in registers %rdi, %rsi, %rdx, %rcx, %r8 and %r9. If there are more parameters, they must be pushed on the stack in last-to-first order, and later popped.

Here’s an example:

movq ..., %rdi
movq ..., %rsi
...
movq ..., %r9
pushq ...
pushq ...

call function_to_call

addq $16, %rsp

The initial movq instructions set the parameters that fit in registers, and the following pushq instructions push two more parameters that don’t fit in registers.

The call instruction pushes the address of the next instruction to the stack and continues execution from the called function’s first instruction.

The final addq $16, %rsp moves the stack pointer upwards by 16 bytes, which effectively undoes the two pushes that we did.

The calling convention has a requirement that the stack is aligned to 16 bytes (why?). This means that when executing call (i.e. after stack arguments are pushed), %rsp must be a multiple of 16. If it is not, a subq $8, %rsp (subtract 8 from %rsp) instruction must be added before pushing the arguments to the stack.

There’s one more important point in the calling convention: a function is allowed to modify any register it wants, except for %rbp, %rbx, %r12, %r13, %r14 and %r15 (why?). These are named callee-saved registers because the called function must restore their previous values before returning. The rest of the registers are caller-saved – the caller must back them up before calling a function if it wants to keep their values.

Exercise (optional)

Write Assembly code for a function that:

takes two parameters,
calls another function f2 with the second parameter plus 5,
returns the sum of first parameter plus 10.

Exercise (optional)

Add the function you wrote to the program template under a new label f, and change function main to

read two input integers instead of one,
call f with these two integers as parameters,
print the return value.

Resources for x86-64 Assembly

Warning

When looking at Assembly code examples on the internet, be aware that there are two widespread syntaxes: Intel syntax and AT&T syntax. On this course, we’ve made the arbitrary choice to use AT&T syntax, which you’ll recognize by register names starting with %.

Crucially, the syntaxes have different argument order: in Intel syntax, the output register is written first, in AT&T it’s written last.

Much online material is also old and written for 32-bit x86 computers. You can recognize this from its use of registers starting with the letter ’e’ instead of ’r’, e.g. %eax instead of %rax. The instructions and ideas are very similar, but some instructions need a suffix (like the ’q’ in movq) to operate in 64-bit mode, and the calling conventions are different. The calling conventions are also different in code written for other operating systems.

Stanford CS107 slides provide an excellent introduction: lecture 15, lecture 16, lecture 17, lecture 18
Introduction to the GNU Assembler is a compact tutorial for writing and running Assembly code on a Linux machine, and interfacing with C code.
Compiler Explorer turns your C/C++ code into Assembly code. Excellent for answering ”how do I write this in Assembly?” questions.
- To see assembly code in AT&T syntax, click on the gear icon and deselect ”Intel syntax”.
- To see a more compact but sometimes more confusing / ”clever” translation, add -O2 to ”Compiler options” to enable optimizations.
Short summary of registers (instruction table uses Intel argument order)
x86-64 reference (detailed and technical, uses Intel argument order)
System V ABI (Linux calling convention)
- More detailed specification
Linux system call reference

Assembly generator

A simple non-optimizing Assembly generator is at its core pretty straightforward: it takes a list of IR instructions and translates each of them into one or more Assembly instructions.

The only wrinkle is how to map the unbounded set of IR variables to the very limited set of available registers to minimize the need to access the stack. This is the register allocation problem.

A good register allocator requires first analyzing the IR in ways that we’ve not gotten to yet, so we’ll write our initial assembly generator without a register allocator: we’ll just store all our variables on the stack. Since modern processors are quite well optimized for reading and writing stack values, we can expect only a 2-5x slowdown compared to using registers efficiently.

Tracking the stack

Let’s create a class that calculates the required stack space and maps each IR variable to a stack location.

We call this class Locals, because global variables will not use stack space. (For now, our only global variables are functions like print_int.)

# src/compilers/assembly_generator.py

class Locals:
    """Knows the memory location of every local variable."""
    _var_to_location: dict[ir.IRVar, str]
    _stack_used: int

    def __init__(self, variables: list[ir.IRVar]) -> None:
        ...  # Completed in task 1

    def get_ref(self, v: ir.IRVar) -> str:
        """Returns an Assembly reference like `-24(%rbp)`
        for the memory location that stores the given variable"""
        return self._var_to_location[v]

    def stack_used(self) -> int:
        """Returns the number of bytes of stack space needed for the local variables."""
        return self._stack_used

You are now ready to do task 1.

To create an instance of Locals, we will need to find all IR variables used by our IR instructions. You can use this implementation of get_all_ir_variables, or implement it yourself.

def get_all_ir_variables(instructions: list[ir.Instruction]) -> list[ir.IRVar]:
    result_list: list[ir.IRVar] = []
    result_set: set[ir.IRVar] = set()

    def add(v: ir.IRVar) -> None:
        if v not in result_set:
            result_list.append(v)
            result_set.add(v)

    for insn in instructions:
        for field in dataclasses.fields(insn):
            value = getattr(insn, field.name)
            if isinstance(value, ir.IRVar):
                add(value)
            elif isinstance(value, list):
                for v in value:
                    if isinstance(v, ir.IRVar):
                        add(v)
    return result_list

Generating Assembly code

We’ll now sketch the function that iterates over IR instructions and emits Assembly code. You’ll get to complete this function yourself.

First we need to make some declarations and set up the stack correctly, as described above.

    .extern print_int
    .extern print_bool
    .extern read_int
    .global main
    .type main, @function

    .section .text

main:
    pushq %rbp
    movq %rsp, %rbp
    subq $40, %rsp  # ($40 is just an example)

.extern print_int means that we expect function print_int to be defined in another file.
.global main means that the symbol main is to be made available to other files.
.type main, @function marks main as a function. Not essential, but some debugging tools use this information.
.section .text means that the following code goes into the ”text” section of the file. That’s the correct place for program code.
main: is a label, similar to IR labels. It sets the symbol ”main” to the address of the next instruction.
The Assembly instructions that set up the function stack as we saw earlier.

The first parameter of subq must be the amount of stack space reserved by Locals.

The rest of the output consists of instructions generated from each IR instruction. Here is a partial implementation:

def generate_assembly(instructions: list[ir.Instruction]) -> str:
    lines = []
    def emit(line: str) -> None: lines.append(line)

    locals = Locals(
        variables=get_all_ir_variables(instructions)
    )

    # ... Emit initial declarations and stack setup here ...

    for insn in instructions:
        emit('# ' + str(insn))
        match insn:
            case ir.Label():
                emit("")
                # ".L" prefix marks the symbol as "private".
                # This makes GDB backtraces look nicer too:
                # https://stackoverflow.com/a/26065570/965979
                emit(f'.L{insn.name}:')
            case ir.LoadIntConst():
                if -2**31 <= insn.value < 2**31:
                    emit(f'movq ${insn.value}, {locals.get_ref(insn.dest)}')
                else:
                    # Due to a quirk of x86-64, we must use
                    # a different instruction for large integers.
                    # It can only write to a register,
                    # not a memory location, so we use %rax
                    # as a temporary.
                    emit(f'movabsq ${insn.value}, %rax')
                    emit(f'movq %rax, {locals.get_ref(insn.dest)}')
            case ir.Jump():
                emit(f'jmp .L{insn.label.name}')
            ...  # Completed in task 2

You are now ready to do task 2.

Intrinsics

When we encounter an instruction like Call(print_int, x1, x2), we will translate it into a call to function print_int. However it would be silly (sort of) to define basic operators like + as callable functions. Instead, we define + as an intrinsic: a special case that emits the appropriate Assembly code for each call to +.

We want + to compile to assembly code that reads arguments from any arbitrary memory locations A and B, and leaves the sum in %rax:

movq A, %rax  # Copy from A to %rax
addq B, %rax  # Write (B + %rax) to %rax

We can define other intrinsics similarly.

Sample code

If you have a good grasp of x86-64 or a desire to learn more of it, you are welcome to write all the necessary intrinsics in your code yourself, but you may also download and use this intrinsics.py.

This module defines a dictionary all_intrinsics, which maps names like + to functions that emit the given Assembly code. These functions are called like this:

all_intrinsics['+'](IntrinsicArgs(
    arg_refs=...,
    result_register=...,
    emit=...
))

arg_refs must be a list of registers or memory locations references of the arguments. You can get them from Locals.
result_register must be a register name. For now you can always use %rax and copy the result from there to the appropriate stack location.
emit must be a function that adds a given line of Assembly code to the output.

You are now ready to do task 3.

Running the assembler

The final step to complete your compiler is to pass the generated Assembly code to an assembler, which translates the Assembly code to a binary form that the computer can execute.

You can download assembler.py and call its function assembler(code, filename). This does two things:

It includes a definition for print_int, print_bool and read_int.
It invokes the GNU Assembler to produce the finished program.

It does these things slightly differently than how we did it above so as not to depend on the C standard library.

The sample assembler.py does the following things, some of which gcc quietly did for us eariler.

First, it defines a separate Assembly file stdlib.s, which contains the functions _start, print_int, print_bool and read_int.

_start is where program execution actually starts. It’s the job of this function to call main. The C standard library includes its own _start – that’s why we didn’t need to define it ourselves earlier. In C, _start has various setup and cleanup responsibilities, but our _start is very simple: it calls main and then makes operating system call number 60 (”sys_exit”) to cleanly stop the program.

print_int function is more complicated, but heavily commented. It constructs the string representation of its parameter on the stack and passes a pointer to it to operating system call 1 (”sys_write”), asking it to write to standard output (”file 1”).

print_bool is similar, but without the complicated string construction.

read_int uses operating system call 0 (”sys_read”) to read one byte of input at a time. On digit characters, it updates the input. On a minus character, it negates the result. On a newline or end of input, it stops reading. It also has some very rudimentary error handling.

Next, assembler.py writes your Assembly code to program.s and runs the GNU assembler to produce two object files: stdlib.o and program.o. Object files contain machine code snippets, optional initialization data, and labels like _start and main.

Object files must be linked to form the final executable file. assembler.py calls the linker program ld which on Linux is usually the GNU linker. Since we didn’t ask the linker to link to any external libraries, such as the C standard library, we’ve produced a pretty minimal executable.

You can optionally give assembler.py’s assemble function parameter extra_libraries=['c'] to link the C library. This may be useful later in the project.

Tasks

Task 1

Complete Locals.__init__. It must:

initialize _var_to_location to map each IR var to stack locations -8(%rbp), -16(%rbp), …,
initialize _stack_used to the number of bytes used.

Task 2

Add the generate_assembly example function to your project, and implement cases for the following IR instructions:

LoadBoolConst: represent true as 1 and false as 0
Copy: copy via %rax because movq can’t have two memory arguments
CondJump: use cmpq $0, _condition_ to set comparison flags, then jne _label_ to ”jump if not equal” (i.e. ”condition != 0”), then jmp _label_ to jump in the other case.

Finally, restore the stack and return from the function as described above.

Check manually that the expression { var x = true; if x then 1 else 2; } yields sensible-looking Assembly code. We’ll test more thoroughly once we’ve implemented Call.

Task 3

Implement the Assembly generation for IR instruction Call. Make it use an intrinsic if available, and generate function calling code otherwise.

Pass arguments according to the calling convention.

It’s OK to not implement stack-based arguments, i.e. you may limit the number of arguments to 6, so they all fit in registers.

Task 4

Add assembler.py to your codebase and make a command compile in __main__.py that runs all compiler stages, including the assembler.

Now try out your finished compiler! 🎉

Task 5

Write an end-to-end test framework where you can easily write programs in your language, along with inputs and expected outputs for the compiled programs.

Here is one possible design:

Read all files from directory test_programs. Let’s call these test files.
Let each such test file contain a sequence of test cases separated by lines containing only ---.
Handle each line of a test case like this:
- If the line is input ... then it defines an input to give the program.
- If the line is prints ... then it defines an expected output that the compiled program should print.
- All other lines are considered part of the test program.
For each test case, define a Python test function like this:
```
import sys
...
  # for each test case found:
      sys.modules[__name__].__setattr__(
          f'test_{test_case.name}',
          run_test_case
      )
```
where run_test_case is a Python function that compiles and runs the test program, giving the required inputs and checking that the expected outputs are printed in the expected order, and that no other outputs are printed.

You can define a new Python function for each test case you’ve found like this:

for test_case in find_test_cases():
    def run_test_case():
        ...
    sys.modules[__name__].__setattr__(
        f'test_{test_case.name}',
        run_test_case
    )

However, if you use the variable test_case in the body of run_test_case, then all these run_test_case functions will see the last value assigned to that variable. In Python, the idiomatic way to work around this seems to be to abuse default parameters like this:

for test_case in find_test_cases():
    def run_test_case(test_case = test_case):
        ... # (do stuff with 'test_case')
    sys.modules[__name__].__setattr__(
        f'test_{test_case.name}',
        run_test_case
    )

Use this system to write end-to-end test cases for your compiler until you’re reasonably confident that it works correctly.

Compilers

spring 2024

7. Assembly generator

Introduction to Assembly code

Main memory

Running Assembly code

Functions and the stack

Defining functions

Calling functions

Resources for x86-64 Assembly

Assembly generator

Tracking the stack

Generating Assembly code

Intrinsics

Running the assembler

Tasks