Implementing Hindley-Milner Type Inference System

About a year ago, I took Stanford CS242 (fall 2019) course and did the assignment to complete an interpreter for a simple ML-like language (it doesn’t have a name, and I’ll call it “Lam” as the file extension is .lam). It’s a bit disappointed that the assignment didn’t mention Hindley-Milner type inference algorithm, so I implemented one in my Rust version of the interpreter.

The implementation is based on chapter 10.6 of Cornell CS3110 textbook. I’ll recommend you to read that chapter first since:

It explains HM algorithm in detail, with step-by-step examples.
You don’t need to read other chapters of the book to understand everything (if you are familiar with OCaml).
I may not explain things that are already described in the book.

I’ll also assume that you have took the CS242 course and have done the assignment.

Type Inference Rules

The CS3110 book introduces some symbols to represent the constraints. I’ll translate them in a style that is more consistent with CS242’s type checking rules. For example, the “if” part of the rule is written below the “then” part in CS3110 while in CS242 we prefer to write “if” above the “then”.

The basic idea of HM algorithm is: instead of checking every type in an expression eagerly, we generate a set of type constraint which is similar to a set of equations, and solve them for the unknown types (in a process called unification).

A constraint is just a claim that two types are equal: $\begin{aligned} \mathsf{Constraint}\ c ::=\ & \tau = \tau \\ \mathsf{Type}\ \tau ::=\ & \mathsf{num} \\ |\ & \mathsf{bool} \\ |\ & \dots \\ \end{aligned}$

The following are some example of rules with the format type checking rules => type inference rules. I’ll leave other rules as exercises for readers.

Basic Types

Constants, variables and lambdas add nothing to the constraints: $\frac{}{\varnothing\vdash n:\mathsf{num}}\Rightarrow\frac{}{\varnothing\vdash n:\mathsf{num}\dashv\varnothing},\tag{T-Num}$ $\frac{x:\tau\in\Gamma}{\Gamma\vdash x:\tau}\Rightarrow\frac{x:\tau\in\Gamma}{\Gamma\vdash x:\tau\dashv\varnothing}\tag{T-Var}$ $\frac{\Gamma, x : \tau_{\mathsf{arg}} \vdash e : \tau_{\mathsf{ret}}}{\Gamma \vdash (\lambda\,(x : \tau_{\mathsf{arg}})\, .\, e) : \tau_{\mathsf{arg}} \to \tau_{\mathsf{ret}}}\Rightarrow\frac{\text{fresh}\ \tau\quad\Gamma, x : \tau \vdash e : \tau_{\mathsf{ret}}\dashv C}{\Gamma \vdash (\lambda\,x\, .\, e) : \tau \to \tau_{\mathsf{ret}}\dashv C},\tag{T-Lam}$ where $\text{fresh}\ \tau$ means $\tau$ is a fresh new type variable that haven’t been used. ¹

Binary operations add a constraint that the both sides of the operation and the result should be num. $\begin{aligned} & \frac{\Gamma\vdash e_L:\mathsf{num}\quad\Gamma\vdash e_R:\mathsf{num}}{\Gamma\vdash e_L\oplus e_R:\mathsf{num}} \\ \Rightarrow\ & \frac{\quad\Gamma\vdash e_L:\tau_L\dashv C_L\quad\Gamma\vdash e_R:\tau_R\dashv C_R}{\Gamma\vdash e_L\oplus e_R:\mathsf{num}\dashv C_L,C_R,\tau_L=\mathsf{num},\tau_R=\mathsf{num}} \end{aligned}\tag{T-Binom}$

Fixpoints requires that the type of $x,e$ and the entire expression are the same: $\frac{\Gamma, x : \tau \vdash e : \tau}{\Gamma \vdash \mathsf{fix}\ (x : \tau)\, .\, e : \tau}\Rightarrow\frac{\text{fresh}\ \tau'\quad\Gamma, x : \tau' \vdash e : \tau\dashv C}{\Gamma \vdash \mathsf{fix}\ x\, .\, e : \tau'\dashv C,\tau'=\tau},\tag{T-Fix}$

If-statements add two constraints: the condition must be bool and both arm must have the same type. $\begin{aligned} & \frac{\Gamma\vdash e_{\mathsf{cond}}:\mathsf{bool}\quad\Gamma\vdash e_{\mathsf{then}}:\tau\quad\Gamma\vdash e_{\mathsf{else}}:\tau}{\Gamma\vdash\mathsf{if}\ e_{\mathsf{cond}}\ \mathsf{then}\ e_{\mathsf{then}}\ \mathsf{else}\ e_{\mathsf{else}}:\tau} \\ \Rightarrow\ & \frac{\Gamma\vdash e_{\mathsf{cond}}:\tau_1\dashv C_1\quad\Gamma\vdash e_{\mathsf{then}}:\tau_2\dashv C_2\quad\Gamma\vdash e_{\mathsf{else}}:\tau_3\dashv C_3}{\Gamma\vdash\mathsf{if}\ e_{\mathsf{cond}}\ \mathsf{then}\ e_{\mathsf{then}}\ \mathsf{else}\ e_{\mathsf{else}}:\tau_2\dashv C_1,C_2,C_3,\tau_1=\mathsf{bool},\tau_2=\tau_3} \end{aligned}\tag{T-If}$

For function applications, we define a new type $\tau$ for the entire expression and add a constraint that the function must have type $\tau_2 \to \tau$ : $\begin{aligned} & \frac{\Gamma \vdash e_{\mathsf{lam}} : \tau_2 \to \tau\quad \Gamma \vdash e_{\mathsf{arg}} : \tau_2}{\Gamma \vdash (e_{\mathsf{lam}}\ e_{\mathsf{arg}}) : \tau} \\ \Rightarrow\ & \frac{\text{fresh}\ \tau\quad\Gamma \vdash e_{\mathsf{lam}} : \tau_1\dashv C_1\quad \Gamma \vdash e_{\mathsf{arg}} : \tau_2\dashv C_2}{\Gamma \vdash (e_{\mathsf{lam}}\ e_{\mathsf{arg}}) : \tau\dashv C_1, C_2, \tau_1 = \tau_2 \to \tau} \end{aligned}\tag{T-App}$

De Bruijn Indices

These rules work well until you decided to use the same variable in different expressions:

let f = fun x -> fun y -> ((fun x -> x == y) 2) || x
in f true 1

If we use a set to represent the context $\Gamma$ , then the type of the inner x (which is num) will overwrite the outer x (which is bool). This isn’t a problem when we can check the user-annotated type without moving out of the scope, but now we have to infer the type of the inner and outer x after visiting the whole expression.

If you have done the interpreter assignment, you should know that the solution is to convert the expression to nameless form using de Bruijn indices:

let f = fun _ -> fun _ -> ((fun _ -> <0> == <1>) 2) || <0>
in f true 1

We also need to change the rules to represent the context as a stack. The type of the variable with index n is the n-th last element of the stack: $\frac{}{\Gamma,\tau\vdash \left<0\right>:\tau\dashv\varnothing}\ (\text{T-Var-Term}),\quad\frac{\Gamma=\tau^*,\tau\quad\tau^*\vdash \left<n\right>:\tau'\dashv C}{\Gamma\vdash \left<n+1\right>:\tau'\dashv\varnothing}\ (\text{T-Var-Next}),$ $\frac{\text{fresh}\ \tau\quad\Gamma,\tau \vdash e : \tau_{\mathsf{ret}}\dashv C}{\Gamma \vdash (\lambda\,\_\, .\, e) : \tau \to \tau_{\mathsf{ret}}\dashv C}\ (\text{T-Lam}),\quad\frac{\text{fresh}\ \tau'\quad\Gamma,\tau' \vdash e : \tau\dashv C}{\Gamma \vdash \mathsf{fix}\ \_\, .\, e : \tau'\dashv C,\tau'=\tau},(\text{T-Fix})$

ADT

Product type constructors are similar to binary operations: $\begin{aligned} & \frac{\Gamma \vdash e_{L} : \tau_{L} \quad \Gamma \vdash e_{R} : \tau_{R}}{\Gamma \vdash (e_{L}, e_{R}) : \tau_{L} \times \tau_{R}} \\ \Rightarrow\ & \frac{\Gamma\vdash e_L:\tau_L\dashv C_L\quad\Gamma\vdash e_R:\tau_R\dashv C_R}{\Gamma \vdash (e_{L}, e_{R}) : \tau_{L} \times \tau_{R}\dashv C_L,C_R}, \end{aligned}$

We can use the same trick as function applications in product type destructors: $\frac{\Gamma \vdash e : \tau_{L} \times \tau_{R}}{\Gamma \vdash e.L : \tau_{L}}\Rightarrow\frac{\text{fresh}\ \tau_L,\tau_R\quad\Gamma \vdash e : \tau\dashv C}{\Gamma \vdash e.L : \tau_{L} \dashv C,\tau=\tau_{L} \times \tau_{R}}$

The type annotation of sum types can also be omitted. The type will be known the time user uses it: $\frac{\Gamma \vdash e : \tau_{L}}{\Gamma \vdash \mathsf{inj}\ e = L\ \mathsf{as}\ \tau_{L} + \tau_{R} : \tau_{L} + \tau_{R}}\Rightarrow\frac{\text{fresh}\ \tau_R\quad\Gamma \vdash e : \tau_{L}\dashv C}{\Gamma \vdash \mathsf{inj}\ e = L : \tau_{L} + \tau_{R}\dashv C},\tag{T-Inject-L}$ $\begin{aligned} & \frac{\Gamma \vdash e : \tau_{L} + \tau_{R} \quad \Gamma, x_{L} : \tau_{L} \vdash e_{L} : \tau \quad \Gamma, x_{R} : \tau_{R} \vdash e_{R} : \tau}{\Gamma \vdash \mathsf{case}\ e\,\{L(x_{L}) \to e_{L} \mid R(x_{R}) \to e_{R}\} : \tau} \\ \Rightarrow &\ \frac{\text{fresh}\ \tau_L,\tau_R,\quad\Gamma \vdash e : \tau_{\mathsf{sum}} \dashv C_{\mathsf{sum}} \quad \Gamma,\tau_{L} \vdash e_{L} : \tau_L'\dashv C_L \quad \Gamma,\tau_{R} \vdash e_{R} : \tau_R'\dashv C_R}{\Gamma \vdash \mathsf{case}\ e\,\{L(\_) \to e_{L} \mid R(\_) \to e_{R}\} : \tau_L'\dashv C_{\mathsf{sum}},C_L,C_R,\tau_{\mathsf{sum}}=\tau_{L} + \tau_{R},\tau_L'=\tau_R'}. \end{aligned}\tag{T-Case}$

How to Use the Rules

We should know how these rules are used before learning polymorphism.

Collecting Constraints and Type

Remember how we used the type checking rule: for an expression $e$ , first we find a rule whose “then” part matches $e$ , check whether the conditions in its “if” part are satisfied (which are also type checks), then we know that $\Gamma\vdash e:\tau$ .

fn type_check_expr(ast: &Expr, ctx: HashMap<Variable, Type>) -> Result<Type, String> {
    match ast {
        Expr::Num(_) => Ok(Type::Num),
        Expr::Addop { binop, left, right } => {
            let tau_left = type_check_expr(left, ctx.clone())?;
            let tau_right = type_check_expr(right, ctx.clone())?;
            match (tau_left.clone(), tau_right.clone()) {
                (Type::Num, Type::Num) => Ok(Type::Num),
                _ => type_mismatch!(tau_left, tau_right, binop),
            }
        }
        // and other rules
    }
}

The first part of type inference works essentially the same, except that we don’t really check the type, only maintain the set of constraints and a type:

pub fn get_constraints(&self, ctx: &mut Vec<Type>) -> Result<(Type, Vec<Constraint>), String> {
    let (tau_result, c_result) = match self {
        Expr::Num(_) => Ok((Type::Num, vec![])),
        Expr::Addop { left, right, .. } | Expr::Mulop { left, right, .. } => {
            let (tau_left, c_left) = left.get_constraints(ctx)?;
            let (tau_right, c_right) = right.get_constraints(ctx)?;
            let constraints = flat!(vec![
                c_left,
                c_right,
                vec![
                    Constraint {
                        type_l: tau_left,
                        type_r: Type::Num,
                    },
                    Constraint {
                        type_l: tau_right,
                        type_r: Type::Num,
                    },
                ]
            ]);
            Ok((Type::Num, constraints))
        }
        // and other rules
    }
}

Unification

Now we got a set of constraints that looks like $C=\{\tau_1=\mathsf{num},\tau_2\times\tau_1=(\mathsf{num}\to\mathsf{bool})\times\tau_3,\dots\}$ and a type that probabily has some unknown variables, like $\forall\alpha.\tau_1\to(\alpha\times\mathsf{bool})$ . The next step is to unify these constraints to know what the types $\tau_1,\tau_2,\dots$ are.

The unification algorithm is straightforward since the constraint and types has a simple, tree-like structure. Unsurprisingly, a union-find set is involved in the algorithm:

For every “forall” types $\forall\alpha.\tau$ in $C$ , introduce a fresh variable $\tau'$ and remove the quantifier so it becomes $[\alpha\to\tau']\tau$ .
Initialize the union-find set $S$ with every variable in $C$ .
Initialize a map $T=\{\}$ . $T$ stores the type of variables we have already known. Its keys are the roots in $S$ and its values are types.
While $C\neq\varnothing$ $C \neq = \emptyset$ :
1. Pop the first constraint $\tau_1=\tau_2$ from $C$ .
2. If both $\tau_1$ and $\tau_2$ are the same atom types like bool and num, then continue.
3. If $\tau_1$ $τ_{1}$ or $\tau_2$ $τ_{2}$ are variables:
  1. If the variable is not the root in $S$ , find and use its root in $S$ .
  2. If both $\tau_1$ and $\tau_2$ are variables, we union these two variables in $S$ . If $(\tau_1,\tau_1')\in T$ or $(\tau_2,\tau_2')\in T$ , we change the key of the items and add $\tau_1'=\tau_2'$ to $C$ (if both items exists in $T$ ).
  3. If $\tau_1$ is variable and $\tau_2$ does not contain $\tau_1$ , we add $(\tau_1,\tau_2)$ to the map $T$ .
  4. If $\tau_2$ is variable and $\tau_1$ does not contain $\tau_2$ , we add $(\tau_2,\tau_1)$ to the map $T$ .
4. If $\tau_1$ is $\tau_{11}\to\tau_{12}$ and $\tau_2$ is $\tau_{21}\to\tau_{22}$ , then add $\tau_{11}=\tau_{21}$ and $\tau_{12}=\tau_{22}$ to $C$ .
5. Similar to 7, if $\tau_1$ and $\tau_2$ falls in the same category of the type’s BNF, then we decompose them into smaller types and add constraints to $C$ .
6. Otherwise, the expression doesn’t type check.

Get the Type

You don’t need to know the type of the expression in general: if the unification succeeded, then the program is type checked. In some cases (like the polymorphism below), we still need to get the type.

We can’t directly use the type $\tau$ generated in the first step because it may contain free type variables (the variable without a quantifier) and whether these variables can be reduced is not known. For example, every fun x -> x in the expressions below generates type a -> a, but their types are different:

(fun x -> x) 1 (* int -> int *)
fun x -> x (* forall a . a -> a *)

The following algorithm removes every free variables in $\tau$ :

While there is free type variable:
1. Get the first free variable $x$ .
2. If $S$ doesn’t contain $x$ , then $\tau\gets\forall x.\tau$ and continue. Otherwise, let $r$ be the root of $x$ .
3. If $r\neq x$ , then $\tau\gets[x\to r]\tau$ and continue.
4. If $T$ doesn’t contain $r$ , then $\tau\gets\forall x.\tau$ and continue. Otherwise, Let $\tau_r$ be the value of $r$ .
5. $\tau\gets[x\to\tau_r]\tau$ .
Return $\tau$ .

Polymorphism

In previous versions of Lam, the user had to explicitly say that there is polymorphism using the $\Lambda\alpha.e$ syntax, and annotate the type of $\alpha$ . It’s impossible to keep this syntax: we won’t know what the type indicated by the $\alpha$ in $\Lambda\alpha.e$ means if the user doesn’t use $\alpha$ in their program.

In OCaml, polymorphism is obtained by let expression: when binding a name to an expression with forall type, the variable should be able to instantiate to different types in different contexts. For example, the id in the following expression will have type bool -> bool in (id false) and num -> num in (id 1):

let id = fun x -> x
in if (id false) then (id 1) else 1

As a result, let is no longer a syntax sugar of function application (when inferencing types. You can still treat it as function application when evaluating it). If you “desugar” the expression above, you will get num = bool and the unification will fail.

The inferencing rule of let is a bit involved: $\frac{\Gamma\vdash e_{\mathsf{var}}:\tau_{\mathsf{var}}\dashv C_{\mathsf{var}}\quad\Gamma,\mathrm{generalize}(\tau_{\mathsf{var}},C_{\mathsf{var}})\vdash e_{\mathsf{in}}:\tau_{\mathsf{in}}\dashv C_{\mathsf{in}}}{\Gamma\vdash\mathsf{let}\ \_=e_{\mathsf{var}}\ \mathsf{in}\ e_{\mathsf{in}}:\tau_{\mathsf{in}}\dashv C_{\mathsf{in}},C_{\mathsf{var}}},\tag{T-Let}$ where $\mathrm{generalize}(\tau,C)$ unifies $C$ and remove free variables in $\tau$ using the algorithm in the previous section.

The instantiation type annotations and the T-Poly-App rule are replaced by instantiating arguments in forall types as fresh variables: $\frac{\text{fresh}\ \tau_1',\dots,\tau_n'}{\Gamma,\forall\,\alpha_1\dots\forall\,\alpha_n\, .\, \tau\vdash\left<0\right> : [\alpha_1\to\tau_1',\dots\alpha_n\to\tau_n']\tau\dashv\varnothing}.\tag{T-Poly-Inst}$

For example, let’s get the type of <0> (fun _ -> <0>) in the program

let _ = fun _ -> fun _ -> <1> <0> in
  let _ = <0> (fun _ -> <0>) in <0> 1

The context is $\forall\alpha.\forall\beta.(\alpha\to\beta)\to\alpha\to\beta$ (which is the type of fun _ -> fun _ -> <1> <0>) and the expression is <0> (fun _ -> <0>). We can use the rules to get the type $\tau$ and constraints $C$ : $\frac{\text{fresh}\ \tau\quad\dfrac{\text{fresh}\ \alpha',\text{fresh}\ \beta'}{\forall\alpha.\forall\beta.(\alpha\to\beta)\to\alpha\to\beta\vdash\left<0\right> : (\alpha'\to\beta')\to\alpha'\to\beta'\dashv\varnothing}\ (\text{T-Poly-Inst})\quad\dfrac{\text{fresh}\ \gamma\quad\dfrac{}{\gamma \vdash\left<0\right> : \gamma\dashv\varnothing}\ (\text{T-Var-Term})}{\varnothing \vdash (\lambda\,\_\, .\, \left<0\right>) : \gamma \to \gamma\dashv\varnothing}\ (\text{T-Lam})}{\forall\alpha.\forall\beta.(\alpha\to\beta)\to\alpha\to\beta \vdash (\left<0\right>\ (\lambda\,\_\to\left<0\right>)) : \tau\dashv (\alpha'\to\beta')\to\alpha'\to\beta' = (\gamma\to\gamma) \to \tau}\ (\text{T-App})$

We can get $\{\alpha=\beta=\gamma,\tau=\alpha\to\alpha\}$ from the constraint $(\alpha'\to\beta')\to\alpha'\to\beta' = (\gamma\to\gamma) \to \tau$ (try to simulate the unification algorithm yourself by hand!), so $\mathrm{generalize}(\tau,\{(\alpha'\to\beta')\to\alpha'\to\beta' = (\gamma\to\gamma) \to \tau\})=\forall\tau'.\tau'\to\tau'$ .

Let’s go through the whole process of another example (in de Bruijn form):

let _ = fun _ -> <0>
in if (<0> false) then (<0> 1) else 1

We know that fun _ -> <0> has type $\tau\to\tau$ where $\tau$ is a free variable: $\dfrac{\dfrac{\text{fresh}\ \tau\quad\dfrac{}{\tau \vdash\left<0\right> : \tau\dashv\varnothing}\ (\text{T-Var-Term})}{\varnothing \vdash (\lambda\,\_\, .\, \left<0\right>) : \tau \to \tau\dashv\varnothing}\ (\text{T-Lam})}{\varnothing\vdash\mathsf{let}\ \_=\lambda\,\_\, .\, \left<0\right>\ \mathsf{in}\ e_{\mathsf{in}}}\ (\text{T-Let})$ so we generalize it to $\forall\tau.\tau\to\tau$ and add it to the context when working on the if expression. Here’s the inference process for the predicate (<0> false): $\dfrac{\text{fresh}\ \tau_1\quad\dfrac{\text{fresh}\ \tau_1'}{\forall\tau.\tau\to\tau \vdash \left<0\right> : \tau_1'\to\tau_1'\dashv \varnothing}\ (\text{T-Poly-Inst})\quad \dfrac{}{\forall\tau.\tau\to\tau \vdash \mathsf{false} : \mathsf{bool}\dashv\varnothing}\ (\text{T-Bool})}{\forall\tau.\tau\to\tau \vdash (\left<0\right>\ \mathsf{false}) : \tau_1\dashv\tau_1'\to\tau_1' = \mathsf{bool} \to \tau_1}\ (\text{T-App})$

The constraint and the type of (<0> 1) can be got using a similar process. Let’s complete the inference: $\dfrac{\dfrac{\dfrac{\dots}{\forall\tau.\tau\to\tau \vdash (\left<0\right>\mathsf{false}) : \tau_1\dashv\tau_1'\to\tau_1' = \mathsf{bool} \to \tau_1}\quad\dfrac{\dots}{\forall\tau.\tau\to\tau \vdash (\left<0\right>1) : \tau_2\dashv\tau_2'\to\tau_2' = \mathsf{num} \to \tau_2}\quad\dfrac{}{\forall\tau.\tau\to\tau\vdash 1:\mathsf{num}\dashv\varnothing}}{\forall\tau.\tau\to\tau\vdash\mathsf{if}\ (\left<0\right>\mathsf{false})\ \mathsf{then}\ (\left<0\right>1)\ \mathsf{else}\ 1:\tau_2\dashv \tau_1'\to\tau_1' = \mathsf{bool} \to \tau_1,\tau_2'\to\tau_2' = \mathsf{num} \to \tau_2,\tau_1=\mathsf{bool},\tau_2=\mathsf{num}}\ (\text{T-If})}{\varnothing\vdash\mathsf{let}\ \_=\lambda\,\_\, .\, \left<0\right>\ \mathsf{in}\ \mathsf{if}\ (\left<0\right>\mathsf{false})\ \mathsf{then}\ (\left<0\right>1)\ \mathsf{else}\ 1:\tau_2\dashv\tau_1'\to\tau_1' = \mathsf{bool} \to \tau_1,\tau_2'\to\tau_2' = \mathsf{num} \to \tau_2,\tau_1=\mathsf{bool},\tau_2=\mathsf{num}}\ (\text{T-Let})$

Therefore the entire expression has type $\tau_2$ . The unification algorithm (again, try to simulate it by hand!) will generate a map $T$ where the root of $\tau_2$ corresponds to num.

Type Safety Proofs

We need to proof the progress and preservation theorem of our type inference system:

Progress: if $\varnothing\vdash e:\tau$ then either $e\ \mathsf{val}$ or there exists $e'$ such that $e\mapsto e'$ .
Preservation: if $\varnothing\vdash e:\tau$ and $e\mapsto e'$ , then $\varnothing\vdash e':\tau$ .

By saying $\Gamma\vdash e:\tau$ , we mean that $\Gamma\vdash e:\tau'\dashv C$ , and $\tau=\mathrm{generalize}(\tau',C)$ .

I’ll try to proof the theorems in later posts. This means that I didn’t have the proof yet, and everything presented in this blog post may be wrong. But I’m already exhausted after spending every weekend of the past month trying to make an interpreter that works ².

If you are looking at the markdown of this post, you may have notice the difference in the LaTeX code’s style. That’s because I found a LaTeX OCR website. The code is partly generated by OCRing a screenshot of the assignment’s webpage. ↩
I haven’t implemented every feature in Lam for the same reason. Also, recursive types and modules aren’t interesting from type inference perspective. ↩