Writing a simple evaluator and type-checker in Haskell

This tutorial serves as a very short and quick summary of the first few chapters of TAPL. Huge thanks to the ##dependent@freenode community (pgiarrusso, lyxia, mietek, …) for clearing all my questions, which provided a good sense of the design of such systems.

1. Untyped simple language

1.1. Syntax

Our syntax, per BNF is defined as follows:

<term>  ::= <bool> | <num> | If <bool> Then <expr> Else <expr> | <arith>
<bool>  ::= T | F | IsZero <num>
<num>   ::= O
<arith> ::= Succ <num> | Pred <num>

For simplicity, we merge all them together in a single Term.

data Term =
    T
    | F
    | O
    | IfThenElse Term Term Term
    | Succ Term
    | Pred Term
    | IsZero Term
    deriving (Show, Eq)

1.2. Inference rules (evaluator)

The semantics we use here are based on so called small-step style, which state how a term is rewritten to a specific value, written t \to v. In contrast, big-step style states how a specific term evaluates to a final value, written t \Downarrow v.

Evaluation of a term is just pattern matching the inference rules. Given a term, it should produce a term:

eval :: Term -> Term

The list of inference rules:

Name Rule
E-IfTrue \text{If T Then } t_2 \text{ Else } t_3 \to t_2
E-IfFalse \text{If F Then } t_2 \text{ Else } t_3 \to t_3
E-If \frac{t_1 \to t'}{\text{If }t_1 \text{ Then } t_2 \text{ Else } t_3 \to \text{If }t' \text{ Then } t_2 \text{ Else } t_3}
E-Succ \frac{t_1 \to t'}{\text{Succ }t_1 \to \text{ Succ } t'}
E-PredZero \text{Pred O} \to \text{O}
E-PredSucc \text{Pred(Succ } k \text {)} \to k
E-Pred \frac{t_1 \to t'}{\text{Pred }t_1 \to \text{ Pred } t'}
E-IszeroZero \text{IsZero O} \to \text{T}
E-IszeroSucc \text{IsZero(Succ } k \text {)} \to \text{F}
E-IsZero \frac{t_1 \to t'}{\text{IsZero }t_1 \to \text{ IsZero } t'}

As an example, the rule E-IfTrue written using big-step semantics would be \frac{t_1 \Downarrow \text{T}, t2 \Downarrow v}{\text{If T} \text{ Then } t_2 \text{ Else } t_3 \Downarrow t_2}.

Given the rules, by pattern matching them we will reduce terms. Implementation in Haskell is mostly “copy-paste” according to the rules:

eval (IfThenElse T t2 t3) = t2
eval (IfThenElse F t2 t3) = t3
eval (IfThenElse t1 t2 t3) = let t' = eval t1 in IfThenElse t' t2 t3
eval (Succ t1) = let t' = eval t1 in Succ t'
eval (Pred O) = O
eval (Pred (Succ k)) = k
eval (Pred t1) = let t' = eval t1 in Pred t'
eval (IsZero O) = T
eval (IsZero (Succ t)) = F
eval (IsZero t1) = let t' = eval t1 in IsZero t'
eval _ = error "No rule applies"

1.3. Conclusion

As an example, evaluating the following:

Main> eval $ Pred $ Succ $ Pred O
Pred O

Corresponds to the following inference rules:

             ----------- E-PredZero
             pred O -> O
       ----------------------- E-Succ
       succ (pred O) -> succ O
------------------------------------- E-Pred
pred (succ (pred O)) -> pred (succ O)

However, if we do:

Main> eval $ IfThenElse O O O
*** Exception: No rule applies

It’s type-checking time!

2. Typed simple language

2.1. Type syntax

In addition to the previous syntax, we create a new one for types, so per BNF it’s defined as follows:

<type> ::= Bool | Nat

In Haskell:

data Type =
    TBool
    | TNat

2.2. Inference rules (type)

Getting a type of a term expects a term, and either returns an error or the type derived:

typeOf :: Term -> Either String Type

Next step is to specify the typing rules.

Name Rule
T-True \text{T : TBool}
T-False \text{F : TBool}
T-Zero \text{O : TNat}
T-If \frac{t_1\text{ : Bool},  t_2\text{ : }T, t_3\text{ : }T}{\text{If }t_1 \text{ Then } t_2 \text{ Else } t_3\text{ : }T}
T-Succ \frac{t\text{ : TNat }}{\text{Succ } t \text{ : TNat}}
T-Pred \frac{t\text{ : TNat }}{\text{Pred } t \text{ : TNat}}
T-IsZero \frac{t\text{ : TNat }}{\text{IsZero } t \text{ : TBool}}

Code in Haskell:

typeOf T = Right TBool
typeOf F = Right TBool
typeOf O = Right TNat
typeOf (IfThenElse t1 t2 t3) =
    case typeOf t1 of
        Right TBool ->
            let t2' = typeOf t2
                t3' = typeOf t3 in
                if t2' == t3'
                then t2'
                else Left "Types mismatch"
        _ -> Left "Unsupported type for IfThenElse"
typeOf (Succ k) =
    case typeOf k of
        Right TNat -> Right TNat
        _ -> Left "Unsupported type for Succ"
typeOf (Pred k) =
    case typeOf k of
        Right TNat -> Right TNat
        _ -> Left "Unsupported type for Pred"
typeOf (IsZero k) =
    case typeOf k of
        Right TNat -> Right TBool
        _ -> Left "Unsupported type for IsZero"

2.3. Conclusion

Going back to the previous example, we can now “safely” evaluate (by type-checking first), depending on type-check results:

Main> typeOf $ IfThenElse O O O
Left "Unsupported type for IfThenElse"
Main> typeOf $ IfThenElse T O (Succ O)
Right TNat
Main> typeOf $ IfThenElse F O (Succ O)
Right TNat
Main> eval $ IfThenElse T O (Succ O)
O
Main> eval $ IfThenElse F O (Succ O)
Succ O

3. Environments

Our neat language supports evaluation and type checking, but does not allow for defining constants. To do that, we will need kind of an environment which will hold information about constants.

type TyEnv = [(String, Type)] -- Type env
type TeEnv = [(String, Term)] -- Term env

We also extend our data type to contain TVar for defining variables, and meanwhile also introduce the Let ... in ... syntax:

data Term =
    ...
    | TVar String
    | Let String Term Term

Here are the rules for variables:

Name Rule
Add binding \frac{\Gamma, a \text{ : }T}{\Gamma \vdash a \text{ : }T}
Retrieve binding \frac{a \text{ : }T \in \Gamma}{\Gamma \vdash a \text{ : }T}

Haskell definitions:

addType :: String -> Type -> TyEnv -> TyEnv
addType varname b env = (varname, b) : env

getTypeFromEnv :: TyEnv -> String -> Maybe Type
getTypeFromEnv [] _ = Nothing
getTypeFromEnv ((varname', b) : env) varname = if varname' == varname then Just b else getTypeFromEnv env varname

Analogously for terms:

addTerm :: String -> Term -> TeEnv -> TeEnv
addTerm varname b env = (varname, b) : env

getTermFromEnv :: TeEnv -> String -> Maybe Term
getTermFromEnv [] _ = Nothing
getTermFromEnv ((varname', b) : env) varname = if varname' == varname then Just b else getTermFromEnv env varname

3.1. Inference rules (evaluator)

At this point we stop giving mathematical inference rules, but it’s a good homework if you want to do it 🙂

eval' is exactly the same as eval, with the following additions:
1. New parameter (the environment) to support retrieval of values for constants
2. Pattern matching for the new Let ... in ... syntax

eval' :: TeEnv -> Term -> Term
eval' env (TVar v) = case getTermFromEnv env v of
    Just ty -> ty
    _       -> error "No var found in env"
eval' env (Let v t t') = eval' (addTerm v (eval' env t) env) t'

We will modify IfThenElse slightly to allow for evaluating variables:

> eval' env (IfThenElse T t2 t3) = eval' env t2
> eval' env (IfThenElse F t2 t3) = eval' env t3
> eval' env (IfThenElse t1 t2 t3) = let t' = eval' env t1 in IfThenElse t' t2 t3

Copy-pasting the others:

eval' env (Succ t1) = let t' = eval' env t1 in Succ t'
eval' _ (Pred O) = O
eval' _ (Pred (Succ k)) = k
eval' env (Pred t1) = let t' = eval' env t1 in Pred t'
eval' _ (IsZero O) = T
eval' _ (IsZero (Succ t)) = F
eval' env (IsZero t1) = let t' = eval' env t1 in IsZero t'

Also since we modified IfThenElse to recursively evaluate, we also need to consider base types:

eval' _ T = T
eval' _ F = F
eval' _ O = O

3.2. Inference rules (type)

typeOf' is exactly the same as typeOf, with the only addition to support env (for retrieval of types for constants in an env) and the new let syntax.

typeOf' :: TyEnv -> Term -> Either String Type
typeOf' env (TVar v) = case getTypeFromEnv env v of
    Just ty -> Right ty
    _       -> Left "No type found in env"
typeOf' _ T = Right TBool
typeOf' _ F = Right TBool
typeOf' _ O = Right TNat
typeOf' env (IfThenElse t1 t2 t3) =
    case typeOf' env t1 of
        Right TBool ->
            let t2' = typeOf' env t2
                t3' = typeOf' env t3 in
                if t2' == t3'
                then t2'
                else Left $ "Type mismatch between " ++ show t2' ++ " and " ++ show t3'
        _ -> Left "Unsupported type for IfThenElse"
typeOf' env (Succ k) =
    case typeOf' env k of
        Right TNat -> Right TNat
        _ -> Left "Unsupported type for Succ"
typeOf' env (Pred k) =
    case typeOf' env k of
        Right TNat -> Right TNat
        _ -> Left "Unsupported type for Pred"
typeOf' env (IsZero k) =
    case typeOf' env k of
        Right TNat -> Right TBool
        _ -> Left "Unsupported type for IsZero"
typeOf' env (Let v t t') = case typeOf' env t of
   Right ty -> typeOf' (addType v ty env) t'
   _        -> Left "Unsupported type for Let"

Some examples:

Main> let termEnv = addTerm "a" O $ addTerm "b" (Succ O) $ addTerm "c" F []
Main> let typeEnv = addType "a" TNat $ addType "b" TNat $ addType "c" TBool []
Main> let e = IfThenElse T (TVar "a") (TVar "b") in (eval' termEnv e, typeOf' typeEnv e)
(O,Right TNat)
Main> let e = IfThenElse T (TVar "a") (TVar "c") in (eval' termEnv e, typeOf' typeEnv e)
(O,Left "Type mismatch between Right TNat and Right TBool")
Main> let e = IfThenElse T F (TVar "c") in (eval' termEnv e, typeOf' typeEnv e)
(F,Right TBool)
Main> let e = (Let "y" (TVar "a") (Succ (TVar "y"))) in eval' e termEnv 
Succ O
Main> let e = (Let "y" (TVar "a") (Succ (TVar "y"))) in typeOf' e typeEnv
Right TNat

4. Product and union

Products and unions are awesome, so we will implement them! We extend the value constructors as follows:

data Term =
    ...
    | Pair Term Term
    | EitherL Term
    | EitherR Term

…and for the type checker:

data Type =
    ...
    | TyVar String       -- constants, use later
    | TyPair Type Type   -- pairs, use later
    | TyEither Type Type -- sum, use later

We add handling to the evaluator:

eval' _ (Pair a b) = Pair a b
eval' env (EitherL a) = eval' env a
eval' env (EitherR a) = eval' env a

…and the type checker:

> typeOf' env (Pair a b) =
>     let a' = typeOf' env a
>         b' = typeOf' env b in
>         case a' of
>             Right ta -> case b' of
>                 Right tb -> Right $ TyPair ta tb
>                 Left _ -> Left "Unsupported type for Pair"
>             Left _  -> Left "Unsupported type for Pair"
> typeOf' env (EitherL a) = case (typeOf' env a) of
>     Right t -> Right $ TyEither t (TyVar "x")
>     _       -> Left "Unsupported type for EitherL"
> typeOf' env (EitherR a) = case (typeOf' env a) of
>     Right t -> Right $ TyEither (TyVar "x") t
>     _       -> Left "Unsupported type for EitherR"

Where TyVar "x" represents a polymorphic variable, and is unhandled since our system has no real support for polymorphism.

Main> let e = IfThenElse T (Pair (TVar "a") (TVar "b")) O in (eval' termEnv e, typeOf' typeEnv e)
(Pair (TVar "a") (TVar "b"),Left "Type mismatch between Right (TyPair TNat TNat) and Right TNat")
Main> let e = IfThenElse T (Pair (TVar "a") (TVar "b")) (Pair O O) in (eval' termEnv e, typeOf' typeEnv e)
(Pair (TVar "a") (TVar "b"),Right (TyPair TNat TNat))
Main> eval' termEnv (EitherL (TVar "a"))
O
Main> typeOf' typeEnv (EitherL (TVar "a"))
Right (TyEither TNat (TyVar "x"))

However, note for union, we have the following:

Main> let e = IfThenElse T (EitherL O) (EitherR F) in (eval' termEnv e, typeOf' typeEnv e)
(O,Left "Type mismatch between Right (TyEither TNat (TyVar \"x\")) and Right (TyEither TBool (TyVar \"x\"))")

This can be fixed by implementing a new function typeEq, that extends the Eq of Type, which would always pass the check for our fake polymorphic variable. It could look something like:

typeEq :: Type -> Type -> Bool
typeEq (TyEither x (TyVar "x")) (TyEither (TyVar "x") y) = True
typeEq x y = x == y

5. Conclusion

The evaluator and the type checker almost live in two separate worlds — they do two separate tasks. If we want to ensure the evaluator will produce the correct results, the first thing is to assure that the type-checker returns no error. It was really interesting to understand the mechanism of type-checking in-depth 🙂

TAPL starts with untyped lambda calculus and proceeds to typed lambda calculus, but is focusing only on type environment, while leaving the variables environment. While this is a good approach, I feel producing a system like in this tutorial is a good first step before jumping to lambda calculus.

My next steps are to finish reading TAPL, and continue playing with toy languages. The code used in this tutorial is available as a literate Haskell file. I also had some fun with De Bruijn index.

Advertisements

Igpay Igpay Atinlay

The other day I watched an interesting episode of Teen Titans Go with my kids. What caught our attention is the song that was part of the episode. You can listen to it here.

Despite the song being fun for my kids to listen to, my daughter asked me if those words made any sense at all. Do they? 🙂

At first I thought it was a simple rotation system. For example, “test” would turn to “estt”, and “scheme” would turn to “chemes”. This is what I told my daughter without any further analysis and we had fun playing with the words.

But then after a while we heard the song again and it seems our system isn’t what was used. Next thinking was, they would rotate the words, and append “ay” at the end of them. So I found the transcript and I noted this works, but you need to keep rotating until you reach a vowel. That’s it! I quickly sat and wrote an encoder in Scheme:

(define (word-to-piglatin str)
  (letrec [(vowels '(#\a #\e #\i #\o #\u))
           (l (string->list str))]
    (cond ((> (length (set-intersect vowels l)) 0)
           (if (member (car l) vowels)
               (list->string (append l '(#\a #\y)))
               (word-to-piglatin (list->string (append (cdr l) (list (car l)))))))
          (else (error "no vowels")))))

Now, just run it:

> (let ([song "pig pig latin is the dopest its real simple i will explain pig pig latin is the dopest say ay always at ends of words pig pig latin is the dopest"])
    (string-join (map word-to-piglatin (string-split song " ")) " "))
"igpay igpay atinlay isay ethay opestday itsay ealray implesay iay illway explainay igpay igpay atinlay isay ethay opestday aysay ayay alwaysay atay endsay ofay ordsway igpay igpay atinlay isay ethay opestday"

Ahh, close, but not really. If you look at the song lyrics, some words do not match. Specifically “isway” vs “isay”. So I thought maybe for some ending letters we append “ay” and for others we append “way”.

Constructing a table was the next step:

Ends in letter | Append
       a       | way
       b       | ay
       c       | ay
       d       | ay|way
       e       | way
       f       | ay|way
       g       |
       h       | ay|way
       i       | way
       j       |
       k       | way
       l       | ay|way
       m       | ay|way
       n       | ay|way
       o       |
       p       | ay|way
       q       |
       r       | ay
       s       | ay|way
       t       | ay|way
       u       |
       v       | ay
       w       | ay
       x       |
       y       | ay|way
       z       |

Nope, for some words it’s “ay”, for some it’s “way”, and for some it’s both. Also, for “on” it’s sometimes “onway” and sometimes “onay”. So it’s pretty random, but was still fun to look into! 🙂

EDIT: Thanks to my coworker Bartosz Budzanowski who told me that Pig Latin is really a thing! The rules are clearly stated on its Wikipedia page. Still, reverse engineering it was fun 🙂

Code updated:

(define (word-to-piglatin str [init #t])
  (letrec [(vowels '(#\a #\e #\i #\o #\u))
           (l (string->list str))]
    (cond ((> (length (set-intersect vowels l)) 0)
           (if (member (car l) vowels)
               (list->string (append l (if init '(#\w #\a #\y) '(#\a #\y))))
               (word-to-piglatin (list->string (append (cdr l) (list (car l)))) #f)))
          (else (error "no vowels")))))

Bonus: Haskell code

WordToPiglatin :: String -> String
WordToPiglatin "" = ""
WordToPiglatin s
    | isVowel (head s) = s ++ "way"
    | any isVowel s    = rotateUntil (isVowel . head) s ++ "ay"
    | otherwise        = s
    where
    isVowel c = c `elem` "aeiouAEIOU"
    rotateUntil _ [] = []
    rotateUntil p s = let res = rotate s in if p res then res else rotateUntil p res
    rotate (x:xs) = xs ++ [x]

CoC base terms – Type and Prop

  1. In mathematics, a judgement is a logical statement in the meta language
  2. An inference rule means that for a given list of judgements J_1, J_2, \ldots, J_n, we can infer that J, which we denote as \frac{J_1, J_2, \ldots, J_N}{J}
  3. Calculus of Constructions is a type theory based on intuitionistic logic.

If 1 and 2 don’t make much sense, I found this picture on Twitter to be useful:

For K being either \text{Type} or \text{Prop}, Calculus of Constructions has the following inference rule:

{\Gamma \ \vdash A \ : \ K \over {\Gamma, \ x:A \ \vdash \ x:A}}

What this rule means is that x can only be of type A, if A is of type \text{Type}, or A is of type \text{Prop}.

The questions that I had were:

  1. What are the (dis)advantages of limiting A to be K?
  2. Alternatively, what are the (dis)advantages of allowing infinite subtyping (allowing terms to be used as types)?
  3. Why do we have these base types \text{Type} and \text{Prop}?

First of all, we can already “emulate” something like “infinite subtyping” in Coq. For example, to “construct” foo : 3 : Nat : Type we can implement F : nat -> Type, then we have that F 3 : Type. Further, we can implement F' : F 3 -> Type, so F' foo is possible. It’s still of type Type but it kinda works.

But, what happens if we lift this restriction? Well that’s not an easy thing to do. At the very least we need some kind of well-formedness condition in place of A : K, otherwise we can get judgements like x : (\text{true} + 0) \ \vdash \ x : (\text{true} + 0). So plainly lifting that restriction not only allows arbitrary terms that are not types, it even allows ill-typed terms which do not type-check.

So foo : 3 : Nat : Type kind of makes no sense. To the left, we have the values we actually compute on; types classify those, and can in turn be classified. In other words, a : b means that a is an element of the “collection” b, but some of these elements are not collections.

So the reason why we have types/values distinction is computation and we can’t really bring them at the “same” level, even though dependent types try to blur the line between the two. We can have values in types, and types in values, but there’s still a difference.

This is why that judgement rule is so important 🙂

As usual, somebody had already considered this 🙂 I started a discussion on #coq@freenode and ##dependent@freenode and got some great answers. Thanks to pgiarrusso, lyxia, and mietek.