Datatypes and Recursion

Plan for this week

Last week:

  • built-in data types
    • base types, tuples, lists (and strings)
  • writing functions using pattern matching and recursion

This week:

  • user-defined data types
    • and how to manipulate them using pattern matching and recursion
  • more details about recursion







Representing complex data

We’ve seen:

  • base types: Bool, Int, Integer, Float
  • some ways to build up types: given types T1, T2

    • functions: T1 -> T2
    • tuples: (T1, T2)
    • lists: [T1]



Algebraic Data Types: a single, powerful technique for building up types to represent complex data

  • Lets you define your own data types
  • Tuples and lists are special cases










Building data types


Three key ways to build complex types/values:

  1. Product types (each-of): a value of T contains a value of T1 and a value of T2

  2. Sum types (one-of): a value of T contains a value of T1 or a value of T2

  3. Recursive types: a value of T contains a sub-value of the same type T










Product types

Tuples can do the job but there are two problems…

Can you spot them?







1. Verbose and unreadable

A type synonym for T: a name that can be used interchangeably with T







2. Unsafe

We want to catch this error at compile time!!!


Solution: construct two different datatypes










Record syntax

Haskell’s record syntax allows you to name the constructor parameters:










Building data types


Three key ways to build complex types/values:

  1. Product types (each-of): a value of T contains a value of T1 and a value of T2 [done]

  2. Sum types (one-of): a value of T contains a value of T1 or a value of T2

  3. Recursive types: a value of T contains a sub-value of the same type T










Example: NanoMarkdown

Suppose I want to represent a text document with simple markup

Each paragraph is either:

  • plain text (String)
  • heading: level and text (Int and String)
  • list: ordered? and items (Bool and [String])

I want to store all paragraphs in a list

But this does not type check!!!










Sum Types

Solution: construct a new type for paragraphs that is a sum (one-of) the three options!

Each paragraph is either:

  • plain text (String)
  • heading: level and text (Int & String)
  • list: ordered? and items (Bool & [String])










QUIZ

What is the type of Text "Hey there!"? i.e. How would GHCi reply to:

A. Syntax error

B. Type error

C. PText

D. String

E. Paragraph










Constructing datatypes

  • T is the new datatype

  • C1 .. Cn are the constructors of T


A value of type T is

  • either C1 v1 .. vk with vi :: T1i
  • or C2 v1 .. vl with vi :: T2i
  • or
  • or Cn v1 .. vm with vi :: Tni


You can think of a T value as a box:

  • either a box labeled C1 with values of types T11 .. T1k inside
  • or a box labeled C2 with values of types T21 .. T2l inside
  • or
  • or a box labeled Cn with values of types Tn1 .. Tnm inside
One-of Types
One-of Types

Constructing datatypes: Paragraph


Apply a constructor = pack some values into a box (and label it)

  • PText "Hey there!"
    • put "Hey there!" in a box labeled PText
  • PHeading 1 "Introduction"
    • put 1 and "Introduction" in a box labeled PHeading
  • Boxes have different labels but same type (Paragraph)
The Paragraph Type
The Paragraph Type

with example values:

The Paragraph Type
The Paragraph Type










QUIZ

What would GHCi say to

A. Syntax error

B. Type error

C. Paragraph

D. [Paragraph]

E. [String]










Example: NanoMD

Now I can create a document like so:



Now I want convert documents in to HTML.

I need to write a function:



How to tell what’s in the box?

  • Look at the label!










Pattern matching

Pattern matching = looking at the label and extracting values from the box

  • we’ve seen it before
  • but now for arbitrary datatypes












Dangers of pattern matching (1)

What would GHCi say to:


Dangers of pattern matching (2)

What would GHCi say to:







Dangers of pattern matching

Beware of missing and overlapped patterns

  • GHC warns you about overlapped patterns
  • GHC warns you about missing patterns when called with -W (use :set -W in GHCi)










Pattern-Match Expression

Everything is an expression?

We’ve seen: pattern matching in equations

Actually, pattern-match is also an expression

The code we saw earlier was syntactic sugar

is just for humans, internally represented as a case-of expression










QUIZ

What is the type of

A. Syntax error

B. Type error

C. String

D. Paragraph

E. Paragraph -> String










Pattern matching expression: typing

The case expression

has type T if

  • each e1eN has type T
  • e has some type D
  • each pattern1patternN is a valid pattern for D
    • i.e. a variable or a constructor of D applied to other patterns

The expression e is called the match scrutinee










QUIZ

What is the type of

A. Syntax error

B. Type error

C. Paragraph

D. Int

E. Paragraph -> Int










Building data types



Three key ways to build complex types/values:

  1. Product types (each-of): a value of T contains a value of T1 and a value of T2 [done]

    • Cartesian product of two sets: v(T) = v(T1) × v(T2)
  2. Sum types (one-of): a value of T contains a value of T1 or a value of T2 [done]

    • Union (sum) of two sets: v(T) = v(T1) ∪ v(T2)
  3. Recursive types: a value of T contains a sub-value of the same type T










Recursive types

Let’s define natural numbers from scratch:








A Nat value is:

  • either an empty box labeled Zero
  • or a box labeled Succ with another Nat in it!

Some Nat values:










Functions on recursive types

Recursive code mirrors recursive data

1. Recursive type as a parameter

Step 1: add a pattern per constructor

Step 2: fill in base case:

Step 2: fill in inductive case using a recursive call:










QUIZ

What does this evaluate to?

A. Syntax error

B. Type error

C. 2

D. Succ Zero

E. Succ (Succ Zero)










2. Recursive type as a result







3. Putting the two together










Lessons learned:

  • Recursive code mirrors recursive data
  • With multiple arguments of a recursive type, which one should I recurse on?
  • The name of the game is to pick the right inductive strategy!










Example: Calculator

I want to implement an arithmetic calculator to evaluate expressions like:

  • 4.0 + 2.9
  • 3.785.92
  • (4.0 + 2.9) * (3.78 - 5.92)

What is a Haskell datatype to represent these expressions?











How do we write a function to evaluate an expression?









Recursion is…

Building solutions for big problems from solutions for sub-problems

  • Base case: what is the simplest version of this problem and how do I solve it?
  • Inductive strategy: how do I break down this problem into sub-problems?
  • Inductive case: how do I solve the problem given the solutions for subproblems?









Lists

Lists aren’t built-in! They are an algebraic data type like any other:

  • List [1, 2, 3] is represented as Cons 1 (Cons 2 (Cons 3 Nil))

  • Built-in list constructors [] and (:) are just fancy syntax for Nil and Cons



Functions on lists follow the same general strategy:






What is the right inductive strategy for appending two lists?










Trees

Lists are unary trees with elements stored in the nodes:

Lists are unary trees
Lists are unary trees

How do we represent binary trees with elements stored in the nodes?

Binary trees with data at nodes
Binary trees with data at nodes










QUIZ: Binary trees I

What is a Haskell datatype for binary trees with elements stored in the nodes?

Binary trees with data at nodes
Binary trees with data at nodes

(A) data Tree = Leaf | Node Int Tree

(B) data Tree = Leaf | Node Tree Tree

(C) data Tree = Leaf | Node Int Tree Tree

(D) data Tree = Leaf Int | Node Tree Tree

(E) data Tree = Leaf Int | Node Int Tree Tree











Binary trees with data at nodes
Binary trees with data at nodes








Functions on trees








QUIZ: Binary trees II

What is a Haskell datatype for binary trees with elements stored in the leaves?

Binary trees with data at leaves
Binary trees with data at leaves

(A) data Tree = Leaf | Node Int Tree

(B) data Tree = Leaf | Node Tree Tree

(C) data Tree = Leaf | Node Int Tree Tree

(D) data Tree = Leaf Int | Node Tree Tree

(E) data Tree = Leaf Int | Node Int Tree Tree


















Why use Recursion?

  1. Often far simpler and cleaner than loops

    • But not always…
  2. Structure often forced by recursive data

  3. Forces you to factor code into reusable units (recursive functions)









Why not use Recursion?

  1. Slow

  2. Can cause stack overflow









Example: factorial



Lets see how fac 4 is evaluated:



Each function call <> allocates a frame on the call stack

  • expensive
  • the stack has a finite size

Can we do recursion without allocating stack frames?









Tail Recursion

Recursive call is the top-most sub-expression in the function body

  • i.e. no computations allowed on recursively returned value

  • i.e. value returned by the recursive call == value returned by function



QUIZ: Is this function tail recursive?

A. Yes

B. No









Tail recursive factorial

Let’s write a tail-recursive factorial!





Lets see how facTR is evaluated:

Each recursive call directly returns the result

  • without further computation

  • no need to remember what to do next!

  • no need to store the “empty” stack frames!









Why care about Tail Recursion?

Because the compiler can transform it into a fast loop


  • Tail recursive calls can be optimized as a loop

    • no stack frames needed!
  • Part of the language specification of most functional languages

    • compiler guarantees to optimize tail calls






That’s all folks!