Datatype-Generic Programming (in Agda)

Lin Tzu-Chi

What do we mean by generic?*

  • A single unified program
  • Abstraction from the differences in similar programs
  • Usually parametrization

* Jeremy Gibbons (2007): Datatype-generic Programming. In: Proceedings of the 2006 International Conference on Datatype-generic Programming, SSDGP’06, Springer-Verlag, Berlin, Heidelberg, pp. 1– 71, doi:10.1007/978-3-540-76786-2 1. Available at http://dl.acm.org/citation.cfm?id=1782894.1782895.

Examples of Generic Programs

Genericity by Value

  • For loops and functions are generic.
#include<stdio.h>

int main() {
  printn(5);
  printn(10);
}

void printn(int n) {
  for (int i = 1; i <= n; i++) {
    for (int j = 0; j < i; j++)
      printf("*");
    printf("\n");
  }
}

Function in C

printf("*");
printf("**");
printf("***");
printf("****");
printf("*****");
printf("*");
printf("**");
printf("***");
printf("****");
printf("*****");
printf("******");
printf("*******");
printf("********");
printf("*********");
printf("**********");

5

10

Genericity by Type

  • Non-generic definition: separate functions for List Int and List Char.
data ListN : Type where
  []  : ListN
  _∷_ : ℕ → List ℕ → List ℕ

append : List ℕ → List ℕ → List ℕ
append []       ys = ys
append (x ∷ xs) ys = x ∷ append xs ys

xs : List
xs = 0 ∷ 1 ∷ []

 Datatype and function definition in Agda (Haskell/OCaml/...)

data ListChar : Type where
  []  : ListChar
  _∷_ : Char → ListChar → ListChar

append : ListChar → ListChar → ListChar
append []       ys = ys
append (x ∷ xs) ys = x ∷ append xs ys

Genericity by Type

  • Generic definition: abstracts Int/Char from List Int/List Char.
data List (A : Type) : Type where
  []  : List A
  _∷_ : A → List A → List A

append : ∀ {A} → List A → List A → List A
append []       ys = ys
append (x ∷ xs) ys = x ∷ append xs ys

xs : List String
xs = "str1" ∷ "str2" ∷ [] 

Parametrized Type in Agda (Haskell/OCaml/...)

Honorable Mentions

  • Generics in Java is genericity by type
  • ​Genericity by structure
  • C++ Standard Template Library
  • Genericity by stage
    • Metaprogramming
      • C++ Templates
      • Template Haskell
  • Genericity by ...

Common Pattern of Generic Programming

  1. Identify a family of similar programs
  2. Parametrize the part that is different from each others

Datatype-generic Programming (DGP)

We want similar functions on similar datatypes,

take map for example

data List (A : Type) : Type where
  []  : List A
  _∷_ : A → List A → List A

mapList : (A → B) → List A → List B
mapList f []       = []
mapList f (x ∷ xs) = f x ∷ mapList f xs

map apply f to each element in a list

Datatype-generic Programming (DGP)

  • We can find similar map functions on similar types
    • E.g. map on binary trees:
data Tree (A : Type) : Type where
  leaf : Tree A
  node : A → Tree A → Tree A → Tree A

mapTree : (A → B) → Tree A → Tree B
mapTree f leaf           = leaf
mapTree f (node x t₁ t₂) = node (f x) (mapTree f t₁) (mapTree f t₂)

Duplication Bad!

data List (A : Type) : Type where
  []  : List A
  _∷_ : A → List A → List A
mapList : (A → B) → List A → List B
mapList f []       = []
mapList f (x ∷ xs) = f x ∷ mapList f xs

data Tree (A : Type) : Type where
  leaf : Tree A
  node : A → Tree A → Tree A → Tree A
mapTree : (A → B) → Tree A → Tree B
mapTree f leaf           = leaf
mapTree f (node x t₁ t₂) = node (f x) (mapTree f t₁) (mapTree f t₂)

data RBTree (A : Type) : Color → Type where    
  leaf  : RBTree A Black   
  nodeR : A → RBTree A Black → RBTree A Black → RBTree A Red
  nodeB : {c1 c2 : Color}
        → A → RBTree A c1    → RBTree A c2    → RBTree A Black
mapRBTree : (A → B) → RBTree A c → RBTree B c
mapRBTree f leaf           = leaf
mapRBTree f (nodeR x t₁ t₂) = nodeR (f x) (mapRBTree f t₁) (mapRBTree f t₂)
mapRBTree f (nodeB x t₁ t₂) = nodeB (f x) (mapRBTree f t₁) (mapRBTree f t₂)

1. Similarities between Tree and List

  • Parametrized by an element type A
  • First constructor takes no parameter
  • Second constructor takes values of parameter A and/or the type itself that is being defined.
data List (A : Type) : Type where
  []  : List A
  _∷_ : A → List A → List A

mapList : (A → B) → List A → List B
mapList f []       = []
mapList f (x ∷ xs) = f x ∷ mapList f xs

data Tree (A : Type) : Type where
  leaf : Tree A
  node : A → Tree A → Tree A → Tree A

mapTree : (A → B) → Tree A → Tree B
mapTree f leaf           = leaf
mapTree f (node x t₁ t₂) = node (f x) (mapTree f t₁) (mapTree f t₂)

2. Both mapList and mapTree share the type

(A -> B) -> T A -> T B
data List (A : Type) : Type where
  []  : List A
  _∷_ : A → List A → List A

mapList : (A → B) → List A → List B
mapList f []       = []
mapList f (x ∷ xs) = f x ∷ mapList f xs

data Tree (A : Type) : Type where
  leaf : Tree A
  node : A → Tree A → Tree A → Tree A

mapTree : (A → B) → Tree A → Tree B
mapTree f leaf           = leaf
mapTree f (node x t₁ t₂) = node (f x) (mapTree f t₁) (mapTree f t₂)
data List (A : Type) : Type where
  []  : List A
  _∷_ : A → List A → List A

mapList : (A → B) → List A → List B
mapList f []       = []
mapList f (x ∷ xs) = f x ∷ mapList f xs

data Tree (A : Type) : Type where
  leaf : Tree A
  node : A → Tree A → Tree A → Tree A

mapTree : (A → B) → Tree A → Tree B
mapTree f leaf           = leaf
mapTree f (node x t₁ t₂) = node (f x) (mapTree f t₁) (mapTree f t₂)

3. Similarities between map definitions:

  1. One clause for each constructor

  2. Result of a clause is constructed from the same constructor and values within the clause

    • f is applied to the values of the parametrized type

    • map in question is applied recursively to the this datatype

Virtue of DGP

  • 'Similarity' is captured formally
  • Establish shared properties (proof reuse), e.g.
\begin{aligned} \textit{map}\ \textit{id}\ \textit{xs} &= \textit{xs} \\ \textit{map}\ (f \circ g)\ \textit{xs} &= \textit{map}\ f\ (\textit{map}\ g\ \textit{xs}) \end{aligned}
mapList : (A → B) → List A → List B
mapList f []       = []
mapList f (x ∷ xs) = f x ∷ mapList f xs

mapTree : (A → B) → Tree A → Tree B
mapTree f leaf           = leaf
mapTree f (node x t₁ t₂) = node (f x) (mapTree f t₁) (mapTree f t₂)

Proof reuse makes better reasoning and optimization.

Typeclass does not (satisfactorily) solve our problems!

class Functor f where
  fmap :: (a -> b) -> f a -> f b
  
instance Functor List where
  fmap f Nil         = Nil
  fmap f (Cons x xs) = Cons (f x) (fmap f xs)
  
instance Functor Tree where
  fmap f Nil            = Nil
  fmap f (Node x t1 t2) = Node (f x) (fmap f t1) (fmap f t2)
  • Typeclass is ad-hoc polymorphism, instead of parametric polymorphism
  • Similarities between definitions are not utilized

Cont 2. Is typeclass useful here?

map :: (Functor F) => (a -> b) -> F a -> F b.

Datatype-Generic Programming

(in Agda)

Requirements for DGP

  • A generic representation for a family of datatypes
    • for datatypes that support map (List, Tree...)
  • Corresponding definition for generic functions
\begin{aligned} \textit{Mono} &\Coloneqq \emptyset \mid E \mid I \mid \textit{Mono} \otimes \textit{Mono} \\ \textcolor{red}{\textit{Poly}} &\Coloneqq \textit{List}\ \textit{Mono} \end{aligned}
\begin{aligned} \textit{ListRep} &= [\, \emptyset , (E \otimes I)\, ] &: \textit{Poly} \\ \textit{TreeRep} &= [\, \emptyset , (E \otimes I \otimes I)\, ] &: \textit{Poly} \end{aligned}

A Datatype Family Representation

Some formerly mentioned datatype families can be represented:

Let's call it the polynomial representation,

which can be seem as shapes of constructors.

data List (A : Type) : Type where
  []  : List A
  _∷_ : A → List A → List A
  
data Tree (A : Type) : Type where
  leaf : Tree A
  node : A → Tree A → Tree A → Tree A

Generic Definitions by Polynomials

μ : Poly → Type → Type
data Mono : Type where
  ∅   : Mono
  I   : Mono
  E   : Mono
  _⊗_ : Mono → Mono → Mono

Poly : Type
Poly = List Mono

ListRep : Poly
ListRep = ∅ ∷ E ⊗ I ∷ []

we can define μ which turn a representation into a datatype

Thanks to Agda's expressiveness, we can get a taste of the polynomial representation

Generic Definitions by Polynomials

μ : Poly → Type → Type

datatypes denoted by μ should behave the same with their native counterparts

length′ : {A : Type} → μ ListRep A → ℕ
length′ (con (inj₁ tt))              = 0
length′ (con (inj₂ (inj₁ (x , xs)))) = suc (length′ xs)

length : {A : Type} → List A → ℕ
length []       = 0
length (x ∷ xs) = suc (length xs)

append′ : {A : Type} → μ ListRep A → μ ListRep A → μ ListRep A
append′ (con (inj₁ tt)) ys              = ys
append′ (con (inj₂ (inj₁ (x , xs)))) ys = 
  con (inj₂ (inj₁ (x , (append′ xs ys))))

append : {A : Type} → List A → List A → List A
append [] ys       = []
append (x ∷ xs) ys = x ∷ append xs ys

Generic Definitions by Polynomials

map : (F : Poly) → {A B : Type} → (A → B) → μ F A → μ F B
map F {A} {B} f (con xs) = con (mapᴾ F xs)
  where
    mapᴹ : (M : Mono) → ⟦ M ⟧ᴹ (A , μ F A) → ⟦ M ⟧ᴹ (B , μ F B)
    mapᴹ ∅       tt        = tt
    mapᴹ E       a         = f a        -- apply f to an element
    mapᴹ I       x         = map F f x  -- recursive call
    mapᴹ (M ⊗ N) (xs , ys) = mapᴹ M xs , mapᴹ N ys

    mapᴾ : (G : Poly) → ⟦ G ⟧ (A , μ F A) → ⟦ G ⟧ (B , μ F B)
    mapᴾ (M ∷ G) (inj₁ xs) = inj₁ (mapᴹ M  xs)  -- preserving
    mapᴾ (M ∷ G) (inj₂ xs) = inj₂ (mapᴾ Ms xs)  -- constructor choice

A generic map function can thus be instantiated manually:

mapId : (F : Poly) (x : μ F A) → map F id x ≡ x

mapComp : (F : Poly) (f : B → C) (g : A → B) (x : μ F A)
        → map F (f ∘ g) x ≡ map F f (map F g x)

Proofs can be established on generic definitions:

to/from

toList : ∀ {A} → μ ListF A → List A
toList con₁            = []
toList (con₂ (x , xs)) = x ∷ toList xs

fromList : ∀ {A} → List A → μ ListF A
fromList []       = con₁
fromList (x ∷ xs) = con₂ (x , (fromList xs))
toTree : ∀ {A} → μ TreeF A → Tree A
toTree con₁                 = leaf
toTree (con₂ (x , xs , ys)) = node x (toTree xs) (toTree ys)

fromTree : ∀ {A} → Tree A → μ TreeF A
fromTree leaf           = con₁
fromTree (node x xs ys) = con₂ (x , (fromTree xs) , (fromTree ys))

Native DGP with Metaprogramming

Native is better

  • Readability
  • Interoperability
    • between different representations
    • with existing libraries

A Naive Solution

We always have the conversion between generic and native definitions since they behave the same (isomorphic)

toList   : ∀ {A} → μ ListF A → List A
fromList : ∀ {A} → List A → μ ListF A

mapList : (A → B) → List A → List B
mapList f = toList ∘ map ListF f ∘ fromList

Problem: Time & space inefficient, difficult to reason about, ...

We want translation at will!

Possible Solutions

  • A new programming language design
  • A compiler redesign for eliminating intermediate structures
  • Metaprogramming
    •  Code generation & instrumentation
  • Metaprogramming mechanism
  • Generic definition for generic definitions
  • Ornamentation
    • Describing relations between datatypes

Existing Problems & Ongoing Work

Datatype-generic Programming

By zekt

Datatype-generic Programming

  • 492