Datatype-Generic Programming for All

Viktor Lin

簡介

  • 中研院資訊所研究助理
  • 臺灣大學哲學系學士畢業
  • 興趣:程式語言、電影、哲學
  • 想寫程式但是不知道要寫什麼

我今天想講的事

  • 同時介紹 Functional Programming 的表象與內涵
  • 從函數式語言的角度看 Generic Programming 與 Polymorphism
  • 介紹針對 generalized algebraic data type (GADT) 的 Datatype-Generic Programming
  • 展示 Polymorphism 與 Functional Programming 的關係

五分鐘學會複習 Haskell

Haskell Datatype (ADT)

data Tree a = Leaf | Node a (Tree a) (Tree a)
  • Tree: Datatype 名稱
  • Branch,Node: Constructor 名稱
  • a: Type Variable
  • 這一行定義了一個 Datatype 與它的兩個 Constructor
Branch :: Tree a -> Tree a -> Tree a
Leaf :: a -> Tree a
t1 :: Tree Int
t1 = Node 1 (Node 2 Leaf Leaf) Leaf

Haskell Function

height :: Tree a -> Int
height Leaf = 0
height (Node a t1 t2) = max (height t1) (height t2) + 1
  • 用 Pattern Matching 定義 Function
  • 一行稱為一個 clause
  • height 作用於任意的 Tree
> height (Node 'c' (Node 'o' Leaf (Node 'c' Leaf Leaf)) Leaf)
3

Generalized Algebraic Datatype (GADT)

data Tree a = Leaf | Node a (Tree a) (Tree a)
---------------------------------------------
data Tree a where
  Leaf :: Tree a
  Node :: a -> Tree a -> Tree a

Generalized Algebraic Datatype (GADT)

data Empty
data NonEmpty

data SafeList a b where
  Nil  :: SafeList a Empty
  Cons :: a -> SafeList a b -> SafeList a NonEmpty

Haskell Typeclass

class Show a where
  show :: a -> String

instance Show Bool where
  show True = "True"
  show False = "False"

class Eq a where
  (==) :: a -> a -> Bool

instance Eq a => Eq (Tree a) where
  Leaf == Leaf = True
  Leaf == _ = False
  (Node a t1 t2) == (Node b u1 u2) = a == b && t1 == u1 && t2 == u2
  (Node a t1 t2) == _ = False

Kind Polymorphism

data Empty
data NonEmpty

data SafeList :: * -> * -> * where
  Nil  :: SafeList a Empty
  Cons :: a -> SafeList a b -> SafeList a NonEmpty

什麼是 generic 程式?*

  • 一個「單一」的程式
  • 是許多程式的抽象化(Abstraction)
  • 常會透過參數化(parametrization)實現

* Jeremy Gibbons (2007): Datatype-generic Programming. In: Proceedings of the 2006 International Conference on Datatype-generic Programming, SSDGP’06, Springer-Verlag, Berlin, Heidelberg, pp. 1– 71, doi:10.1007/978-3-540-76786-2 1. Available at http://dl.acm.org/citation.cfm?id=1782894.1782895.

Generic 程式範例

Genericity by Value

  • For 迴圈與函數是最基本的 generic 程式
#include<stdio.h>

int main() {
  printn(5);
  printn(10);
}

void printn(int n) {
  for (int i = 1; i <= n; i++) {
    for (int j = 0; j < i; j++)
      printf("*");
    printf("\n");
  }
}

Function in C

printf("*");
printf("**");
printf("***");
printf("****");
printf("*****");
printf("*");
printf("**");
printf("***");
printf("****");
printf("*****");
printf("******");
printf("*******");
printf("********");
printf("*********");
printf("**********");

5

10

Genericity by Type

  • List of Int 與 List of Char 不 Generic 的定義方法:
data ListInt = Nil | Int :+ ListInt

append :: ListInt -> ListInt -> ListInt
append Nil         ys = ys
append (x :+ xs) ys = x :+ append xs ys

xs :: ListInt
xs = 0 :+ 1 :+ Nil

 Datatype and function in Haskell

data ListChar = Nil | Char :+ ListChar

append :: ListChar -> ListChar -> ListChar
append Nil       ys = ys
append (x :+ xs) ys = x :+ append xs ys

Genericity by Type

  • Generic 定義:將 Int/Char 抽象化
data List a = Nil | a :+ ListInt

append :: List a -> List a -> List a
append Nil         ys = ys
append (x :+ xs) ys = x :+ append xs ys

xs :: List Int
xs = 0 :+ 1 :+ Nil

Parametrized Type in Haskell

Honorable Mentions

  • Genericity by type
    • Generics in Java
  • ​Genericity by structure
    • C++ Standard Template Library (concept*, structure...)
  • Genericity by shape
    • Datatype-generic Programming
  • Genericity by ...

Generic 程式的常見形式

  1. 找出一系列類似的程式
  2. 將其中的模式參數化(Parametrization)

Datatype-generic Programming (DGP)

類似的 datatype 要有類似的函數,以 map 為例:

data List a = Nil | a :+ List a

mapList :: (a -> b) -> List a -> List b
mapList f Nil       = Nil
mapList f (x :+ xs) = f x :+ mapList f xs

mapList apply f to each element in a list

data Tree a = Leaf | Node a (Tree a) (Tree a)

mapTree :: (a -> b) -> Tree a -> Tree b
mapTree f Leaf = Leaf
mapTree f (Node a t1 t2) = Node (f a) (mapTree f t1) (mapTree f t2)

我們也能對 Tree 定義類似的 map 函數

Duplication Bad!

data List a = Nil | a :+ List a

mapList :: (a -> b) -> List a -> List b
mapList f Nil       = Nil
mapList f (x :+ xs) = f x :+ mapList f xs

data Tree a = Leaf | Node a (Tree a) (Tree a)

mapTree :: (a -> b) -> Tree a -> Tree b
mapTree f Leaf = Leaf
mapTree f (Node a t1 t2) = Node (f a) (mapTree f t1) (mapTree f t2)
{-# LANGUAGE GADTs, RankNTypes, PolyKinds, DataKinds #-}
data Color = Red | Black

data RBTree :: forall a. a -> Color -> * where
  LeafB :: RBTree a Black
  NodeR :: a -> RBTree a Black -> RBTree a Black -> RBTree a Red
  NodeB :: a -> RBTree a c1    -> RBTree a c2    -> RBTree a Black
mapRBTree :: (a -> b) -> RBTree a c -> RBTree b c
mapRBTree f LeafB = LeafB
mapRBTree f (NodeR a t1 t2) = NodeR (f a) (mapRBTree f t1) (mapRBTree f t2)
mapRBTree f (NodeB a t1 t2) = NodeB (f a) (mapRBTree f t1) (mapRBTree f t2)

mapListmapTree 像在哪?

1. Tree 跟 List 很像!

  • 有一個 type variable a(都是 a -> * 的 Type Constructor)
  • 第一個 constructor 不拿參數
  • 第二個 constructor 拿一個 a 及被定義的 Datatype 自己
data List a where
  Nil :: List a
  (:+) :: a -> List a -> List a

mapList :: (a -> b) -> List a -> List b
mapList f Nil       = Nil
mapList f (x :+ xs) = f x :+ mapList f xs

data Tree a where
  Leaf :: Tree a
  Node :: a -> Tree a -> Tree a -> Tree a

mapTree :: (a -> b) -> Tree a -> Tree b
mapTree f Leaf = Leaf
mapTree f (Node a t1 t2) = Node (f a) (mapTree f t1) (mapTree f t2)

2. mapListmapTree 的 type 有一樣的形式:

(a -> b) -> T a -> T b
data List a where
  Nil :: List a
  (:+) :: a -> List a -> List a

mapList :: (a -> b) -> List a -> List b
mapList f Nil       = Nil
mapList f (x :+ xs) = f x :+ mapList f xs

data Tree a where
  Leaf :: Tree a
  Node :: a -> Tree a -> Tree a -> Tree a

mapTree :: (a -> b) -> Tree a -> Tree b
mapTree f Leaf = Leaf
mapTree f (Node a t1 t2) = Node (f a) (mapTree f t1) (mapTree f t2)

3. 兩個 map 定義方式相似:

  1. 一個 clause 對應一個 constructor

  2. 一個 clause 的結果與拿到的 Datatype 使用同一個 constructor

    • f 被套用在被參數化的型別的值

    • map f 被遞迴地套用在同樣的 Datatype 上

data List a where
  Nil :: List a
  (:+) :: a -> List a -> List a

mapList :: (a -> b) -> List a -> List b
mapList f Nil       = Nil
mapList f (x :+ xs) = f x :+ mapList f xs

data Tree a where
  Leaf :: Tree a
  Node :: a -> Tree a -> Tree a -> Tree a

mapTree :: (a -> b) -> Tree a -> Tree b
mapTree f Leaf = Leaf
mapTree f (Node a t1 t2) = Node (f a) (mapTree f t1) (mapTree f t2)

Functor!?

class Functor f where
  fmap :: (a -> b) -> f a -> f b
  
instance Functor List where
  fmap f Nil         = Nil
  fmap f (Cons x xs) = Cons (f x) (fmap f xs)
  
instance Functor Tree where
  fmap f Nil            = Nil
  fmap f (Node x t1 t2) = Node (f x) (fmap f t1) (fmap f t2)
  • Typeclass 是特設多型(ad-hoc polymorphism),而非我們想要的通用多型(universal polymorphism)
  • 依然要對每個 Datatype 重新定義、沒有利用到定義裡的相似性

聽說有個 Typeclass 叫 Functor

我們想要一個打全部的 generic map!

Functor Laws 部分捕捉了我們想要的性質:

聽說有個 Typeclass 叫 Functor

\begin{aligned} \textit{map}\ \textit{id}\ \textit{xs} &= \textit{xs} \\ \textit{map}\ (f \circ g)\ \textit{xs} &= \textit{map}\ f\ (\textit{map}\ g\ \textit{xs}) \end{aligned}
  • 這是要求 Programmer 自行檢查的外部性質而非 Functor 的 map 一定有的內部性質
  • 每次定義都要 Programmer 重新檢查一次

目前我們知道...

  • 很多 Datatype 都「感覺」有個 map
  • 也感覺有 filter、fold...

我們還想知道...

  • 很多 Datatype 是哪些?
  • 這個感覺能不能被正式地寫下來?

Datatype-Generic Programming

DGP 需要...

  • 表示一系列 Datatype 的方法(generic representation)
    • 有可能表示所有有 map 的 datatype 嗎?
  • 定義 generic functions 的方法
\begin{aligned} \textit{Mono} &\Coloneqq \emptyset \mid E \mid I \mid \textit{Mono} \otimes \textit{Mono} \\ \textcolor{red}{\textit{Poly}} &\Coloneqq \textit{List}\ \textit{Mono} \end{aligned}
\begin{aligned} \textit{ListRep} &= [\, \emptyset , (E \otimes I)\, ] &: \textit{Poly} \\ \textit{TreeRep} &= [\, \emptyset , (E \otimes I \otimes I)\, ] &: \textit{Poly} \end{aligned}

一個涵蓋 GADT 的表示法

目前出現過的 Datatype 不外乎能用一個結構表示:

我們叫它多項式表示法(polynomial representation)

被抽象化的是 constructor 的形狀

data List a where
  Nil  :: List a
  _:+_ :: a -> List a -> List a
  
data Tree a where
  Leaf :: Tree a
  Node :: a -> Tree a -> Tree a -> Tree a

接下來的部分大概會跳過

data NP :: (k -> *) -> [k] -> * where
  Nil :: NP f '[]
  (:*) :: f x -> NP f xs -> NP f (x ': xs)

data NS :: (k -> *) -> [k] -> * where
  Z :: f x -> NS f (x ': xs)
  S :: NS f xs -> NS f (x ': xs)

A taste of the polynomial representation:

type family Code (a :: *) :: [[*]]

data Expr = Num Int | Add {left :: Expr, right :: Expr}

type instance Code Expr = '[ '[Int], '[Expr, Expr] ]

接下來的部分大概會跳過

type SOP (f :: k -> *) (xss :: [[k]]) = NS (NP f) xss
type Rep a = SOP I (Code a)

class (Code a) => Generic (a ::	*) where
  from :: a -> Rep a
  to :: Rep a -> a

instance Generic Expr where
  from (Num n) = Z (I n :* Nil)
  from (Add e f) = S (Z (I e :* I f :* Nil)) 
  
  to (Z (I n :* Nil)) = Num n
  to (S (Z (I e :* I f :* Nil))) =  Add e f

A taste of the polynomial representation:

type family Code (a :: *) :: [[*]]

data Expr = Num Int | Add {left :: Expr, right :: Expr}

type instance Code Expr = '[ '[Int], '[Expr, Expr] ]

一個 Generic 程式可以...

  1. 用 from 得到 Generic 表示法
  2. 對 Generic 結構做操作
  3. 用 to 轉回 Datatype

Scary Codes

只要是 Generic 的 instance (e.g. Expr)就能轉成 JSON:

Scary Codes

Scary Codes

寫 Generic Function 也太難了吧!

  • 會寫 Generic 的 instance 就能有 Eq, Show, Lens, toJSON, fromJSON...
    • 就算你不會寫 instance
      • 用別人寫好的 Library 也很好用
      • 就算你不會用 Library...
        • 起碼知道寫 generic function 的「感覺」

就算你不會 DGP...

DGP in a nutshell

  1. 有一個能表示眾多 Datatype 的 Generic 表示法
  2. 有一個機制將 Native Datatype 轉成 Generic 表示法(Metaprogramming?)
  3. 對所有能以這個表示法表示的 Datatype 寫出 Generic Function
  4. 做 to/from 轉換
  5. (to-do) 使用 Metaprogramming 產生 function

總結 & 心得

常見的「FP 風味」設計

  • 每個 type 分別定義一個 map、fold、filter...
  • 不同 function 無法保證行為一致
  • 不能使出組合技

Polymorphism 不夠強

pub trait Summary {
    fn summarize(&self) -> String {
        String::from("(Read more...)")
    }
}

Functor 是「kind 為 * -> * 的 Type Constructor」

的性質:e.g. List :: * -> *, Tree :: * -> *

沒有 higher kinded type 無法完整表達!

其他補充

  • Filter 怎寫?
  • Foldr 怎寫?
  • Deriving?

References

Datatype-Generic Programming for All of Us

By zekt

Datatype-Generic Programming for All of Us

  • 917