This is a compiler for the MinCaml language implemented using Rust and cranelift.
I wanted to experiment with compiler design in Rust. Originally I wanted to implement register allocator too, but that turned out to be a challenge in Rust (borrow checking issues everywhere), so I later decided to use cranelift for code generation, which was also something I wanted to learn.
- A subset of OCaml (but see integer size below)
- 64-bit integers (different from OCaml integers which are 63-bit), 64-bit floats, arrays and tuples
- No user defined types
- No polymorphism, all types inferred
- Tail-call elimination (currently unimplemented, see cranelift issue)
- No garbage collection (not possible to implement with cranelift anyway, as object code backend currently doesn't support stack maps)
Simply cargo build
should work.
Build step above generates two executables:
- mc: main compiler executable.
- test: test runner.
To compile a file simply run mc
with the file path as argument. Example:
$ ./target/release/mc programs/fib.ml
mc
should dump some intermediate code, some stats, and finally generate two
files: fib.o
and fib
. The latter is the executable for this program.
mc
uses gcc
for building the runtime system (just a few built-in functions
implemented in C) and linking.
To run the tests simply run the test
executable. Note that the test runner
uses ocamlc
as the reference compiler so make sure it is installed.
Currently the test programs/bench/harmonic.ml
fails with stack overflow as we
don't do tail-call elimination, see cranelift issue.
The code does not follow the original MinCaml compiler, so here is some notes for readers:
-
Hand-written lexer and parser are in
src/lexer.rs
andsrc/parser.rs
. Parser implements recursive-descent with something like "precedence climbing" or Pratt (not sure which, or if it's exactly one of them, but the idea should be similar to one or both of them) for expression parsing. -
To avoid excessive heap allocation and cloning, we intern variables and types. The intern table type is implemented in
src/interner.rs
. Variables, types and intern tables are maintained byCtx
(for "context"), which is implemented insrc/ctx.rs
. -
After parsing we type check (
src/type_check.rs
). Type checker does naive unification (no union-find) and does not implement type schemes or generalization (the language doesn't support polymorphism), so it's fairly simple.One interesting thing type checker does is it replaces uses of variables with their binders. So for example when we parse
let x = 1 in x
the parser generates unique numbers for every variable and generates something likelet x{0} = 1 in x{1}
. Here two uses ofx
look the same but they're different to the compiler. Type checker, when seeingx{1}
, looks at the scope for its binder, and replaces it withx{0}
. So after type checking we getlet x{0} = 1 in x{0}
.This means we don't need a renaming step to eliminate shadowing and make it easy to find binders of variables. Shadowing may happen in user-written names of variables, but no actual shadowing happens after type checking.
-
Next pass is
anormal
(src/anormal.rs
), which implement A-normalization. Nothing interesting here. -
Next pass is closure conversion (
src/closure_convert.rs
). -
Next pass is lowering (
src/lower/mod.rs
). This pass turns the program into a CFG with functions, basic blocks, assignment statements, and branching. -
Final pass is code generator, which generates native code using cranelift. Only tested on x86_64 Linux.
-
After that we link the generated object code using
gcc
.
This project was for learning purposes and it's mostly done. Maybe one day I may want to implement instruction selection and register allocation with a calling convention that supports tail-call optimization and garbage collection.
If you're looking for small compilers to study, you might be interested in another (and older) project of mine, which implements a compiler in Racket for a subset of Racket. Some notable differences from this project: it doesn't implement a parser (uses Racket's reader), but implements register allocator, and has a garbage collector.