Datatype-Generic Programming for All
Viktor Lin
簡介
- 中研院資訊所研究助理
- 臺灣大學哲學系學士畢業
- 興趣:程式語言、電影、哲學
- 想寫程式但是不知道要寫什麼
我今天想講的事
- 同時介紹 Functional Programming 的表象與內涵
- 從函數式語言的角度看 Generic Programming 與 Polymorphism
- 介紹針對 generalized algebraic data type (GADT) 的 Datatype-Generic Programming
- 展示 Polymorphism 與 Functional Programming 的關係
五分鐘學會複習 Haskell
Haskell Datatype (ADT)
data Tree a = Leaf | Node a (Tree a) (Tree a)
-
Tree
: Datatype 名稱 -
Branch,Node
: Constructor 名稱 -
a
: Type Variable - 這一行定義了一個 Datatype 與它的兩個 Constructor
Branch :: Tree a -> Tree a -> Tree a
Leaf :: a -> Tree a
t1 :: Tree Int
t1 = Node 1 (Node 2 Leaf Leaf) Leaf
Haskell Function
height :: Tree a -> Int
height Leaf = 0
height (Node a t1 t2) = max (height t1) (height t2) + 1
- 用 Pattern Matching 定義 Function
- 一行稱為一個 clause
-
height
作用於任意的 Tree
> height (Node 'c' (Node 'o' Leaf (Node 'c' Leaf Leaf)) Leaf)
3
Generalized Algebraic Datatype (GADT)
data Tree a = Leaf | Node a (Tree a) (Tree a)
---------------------------------------------
data Tree a where
Leaf :: Tree a
Node :: a -> Tree a -> Tree a
Generalized Algebraic Datatype (GADT)
data Empty
data NonEmpty
data SafeList a b where
Nil :: SafeList a Empty
Cons :: a -> SafeList a b -> SafeList a NonEmpty
Haskell Typeclass
class Show a where
show :: a -> String
instance Show Bool where
show True = "True"
show False = "False"
class Eq a where
(==) :: a -> a -> Bool
instance Eq a => Eq (Tree a) where
Leaf == Leaf = True
Leaf == _ = False
(Node a t1 t2) == (Node b u1 u2) = a == b && t1 == u1 && t2 == u2
(Node a t1 t2) == _ = False
Kind Polymorphism
data Empty
data NonEmpty
data SafeList :: * -> * -> * where
Nil :: SafeList a Empty
Cons :: a -> SafeList a b -> SafeList a NonEmpty
什麼是 generic 程式?*
- 一個「單一」的程式
- 是許多程式的抽象化(Abstraction)
- 常會透過參數化(parametrization)實現
* Jeremy Gibbons (2007): Datatype-generic Programming. In: Proceedings of the 2006 International Conference on Datatype-generic Programming, SSDGP’06, Springer-Verlag, Berlin, Heidelberg, pp. 1– 71, doi:10.1007/978-3-540-76786-2 1. Available at http://dl.acm.org/citation.cfm?id=1782894.1782895.
Generic 程式範例
Genericity by Value
- For 迴圈與函數是最基本的 generic 程式
#include<stdio.h>
int main() {
printn(5);
printn(10);
}
void printn(int n) {
for (int i = 1; i <= n; i++) {
for (int j = 0; j < i; j++)
printf("*");
printf("\n");
}
}
Function in C
printf("*");
printf("**");
printf("***");
printf("****");
printf("*****");
printf("*");
printf("**");
printf("***");
printf("****");
printf("*****");
printf("******");
printf("*******");
printf("********");
printf("*********");
printf("**********");
5
10
Genericity by Type
- List of Int 與 List of Char 不 Generic 的定義方法:
data ListInt = Nil | Int :+ ListInt
append :: ListInt -> ListInt -> ListInt
append Nil ys = ys
append (x :+ xs) ys = x :+ append xs ys
xs :: ListInt
xs = 0 :+ 1 :+ Nil
Datatype and function in Haskell
data ListChar = Nil | Char :+ ListChar
append :: ListChar -> ListChar -> ListChar
append Nil ys = ys
append (x :+ xs) ys = x :+ append xs ys
Genericity by Type
- Generic 定義:將 Int/Char 抽象化
data List a = Nil | a :+ ListInt
append :: List a -> List a -> List a
append Nil ys = ys
append (x :+ xs) ys = x :+ append xs ys
xs :: List Int
xs = 0 :+ 1 :+ Nil
Parametrized Type in Haskell
Honorable Mentions
- Genericity by type
- Generics in Java
- Genericity by structure
- C++ Standard Template Library (concept*, structure...)
- Genericity by shape
- Datatype-generic Programming
- Genericity by ...
Generic 程式的常見形式
- 找出一系列類似的程式
- 將其中的模式參數化(Parametrization)
Datatype-generic Programming (DGP)
類似的 datatype 要有類似的函數,以 map 為例:
data List a = Nil | a :+ List a
mapList :: (a -> b) -> List a -> List b
mapList f Nil = Nil
mapList f (x :+ xs) = f x :+ mapList f xs
mapList
apply f to each element in a list
data Tree a = Leaf | Node a (Tree a) (Tree a)
mapTree :: (a -> b) -> Tree a -> Tree b
mapTree f Leaf = Leaf
mapTree f (Node a t1 t2) = Node (f a) (mapTree f t1) (mapTree f t2)
我們也能對 Tree 定義類似的 map 函數
Duplication Bad!
data List a = Nil | a :+ List a
mapList :: (a -> b) -> List a -> List b
mapList f Nil = Nil
mapList f (x :+ xs) = f x :+ mapList f xs
data Tree a = Leaf | Node a (Tree a) (Tree a)
mapTree :: (a -> b) -> Tree a -> Tree b
mapTree f Leaf = Leaf
mapTree f (Node a t1 t2) = Node (f a) (mapTree f t1) (mapTree f t2)
{-# LANGUAGE GADTs, RankNTypes, PolyKinds, DataKinds #-}
data Color = Red | Black
data RBTree :: forall a. a -> Color -> * where
LeafB :: RBTree a Black
NodeR :: a -> RBTree a Black -> RBTree a Black -> RBTree a Red
NodeB :: a -> RBTree a c1 -> RBTree a c2 -> RBTree a Black
mapRBTree :: (a -> b) -> RBTree a c -> RBTree b c
mapRBTree f LeafB = LeafB
mapRBTree f (NodeR a t1 t2) = NodeR (f a) (mapRBTree f t1) (mapRBTree f t2)
mapRBTree f (NodeB a t1 t2) = NodeB (f a) (mapRBTree f t1) (mapRBTree f t2)
mapList
與 mapTree
像在哪?
1. Tree 跟 List 很像!
- 有一個 type variable
a(都是
a -> *
的 Type Constructor) - 第一個 constructor 不拿參數
- 第二個 constructor 拿一個
a
及被定義的 Datatype 自己
data List a where
Nil :: List a
(:+) :: a -> List a -> List a
mapList :: (a -> b) -> List a -> List b
mapList f Nil = Nil
mapList f (x :+ xs) = f x :+ mapList f xs
data Tree a where
Leaf :: Tree a
Node :: a -> Tree a -> Tree a -> Tree a
mapTree :: (a -> b) -> Tree a -> Tree b
mapTree f Leaf = Leaf
mapTree f (Node a t1 t2) = Node (f a) (mapTree f t1) (mapTree f t2)
2. mapList
與 mapTree
的 type 有一樣的形式:
(a -> b) -> T a -> T b
data List a where
Nil :: List a
(:+) :: a -> List a -> List a
mapList :: (a -> b) -> List a -> List b
mapList f Nil = Nil
mapList f (x :+ xs) = f x :+ mapList f xs
data Tree a where
Leaf :: Tree a
Node :: a -> Tree a -> Tree a -> Tree a
mapTree :: (a -> b) -> Tree a -> Tree b
mapTree f Leaf = Leaf
mapTree f (Node a t1 t2) = Node (f a) (mapTree f t1) (mapTree f t2)
3. 兩個 map 定義方式相似:
-
一個 clause 對應一個 constructor
-
一個 clause 的結果與拿到的 Datatype 使用同一個 constructor
-
f 被套用在被參數化的型別的值
-
map f 被遞迴地套用在同樣的 Datatype 上
-
data List a where
Nil :: List a
(:+) :: a -> List a -> List a
mapList :: (a -> b) -> List a -> List b
mapList f Nil = Nil
mapList f (x :+ xs) = f x :+ mapList f xs
data Tree a where
Leaf :: Tree a
Node :: a -> Tree a -> Tree a -> Tree a
mapTree :: (a -> b) -> Tree a -> Tree b
mapTree f Leaf = Leaf
mapTree f (Node a t1 t2) = Node (f a) (mapTree f t1) (mapTree f t2)
Functor!?
class Functor f where
fmap :: (a -> b) -> f a -> f b
instance Functor List where
fmap f Nil = Nil
fmap f (Cons x xs) = Cons (f x) (fmap f xs)
instance Functor Tree where
fmap f Nil = Nil
fmap f (Node x t1 t2) = Node (f x) (fmap f t1) (fmap f t2)
- Typeclass 是特設多型(ad-hoc polymorphism),而非我們想要的通用多型(universal polymorphism)
- 依然要對每個 Datatype 重新定義、沒有利用到定義裡的相似性
聽說有個 Typeclass 叫 Functor
我們想要一個打全部的 generic map!
Functor Laws 部分捕捉了我們想要的性質:
聽說有個 Typeclass 叫 Functor
- 這是要求 Programmer 自行檢查的外部性質而非 Functor 的 map 一定有的內部性質
- 每次定義都要 Programmer 重新檢查一次
目前我們知道...
- 很多 Datatype 都「感覺」有個 map
- 也感覺有 filter、fold...
我們還想知道...
- 很多 Datatype 是哪些?
- 這個感覺能不能被正式地寫下來?
Datatype-Generic Programming
DGP 需要...
- 表示一系列 Datatype 的方法(generic representation)
- 有可能表示所有有 map 的 datatype 嗎?
- 定義 generic functions 的方法
一個涵蓋 GADT 的表示法
目前出現過的 Datatype 不外乎能用一個結構表示:
我們叫它多項式表示法(polynomial representation)
被抽象化的是 constructor 的形狀
data List a where
Nil :: List a
_:+_ :: a -> List a -> List a
data Tree a where
Leaf :: Tree a
Node :: a -> Tree a -> Tree a -> Tree a
接下來的部分大概會跳過
data NP :: (k -> *) -> [k] -> * where
Nil :: NP f '[]
(:*) :: f x -> NP f xs -> NP f (x ': xs)
data NS :: (k -> *) -> [k] -> * where
Z :: f x -> NS f (x ': xs)
S :: NS f xs -> NS f (x ': xs)
A taste of the polynomial representation:
type family Code (a :: *) :: [[*]]
data Expr = Num Int | Add {left :: Expr, right :: Expr}
type instance Code Expr = '[ '[Int], '[Expr, Expr] ]
接下來的部分大概會跳過
type SOP (f :: k -> *) (xss :: [[k]]) = NS (NP f) xss
type Rep a = SOP I (Code a)
class (Code a) => Generic (a :: *) where
from :: a -> Rep a
to :: Rep a -> a
instance Generic Expr where
from (Num n) = Z (I n :* Nil)
from (Add e f) = S (Z (I e :* I f :* Nil))
to (Z (I n :* Nil)) = Num n
to (S (Z (I e :* I f :* Nil))) = Add e f
A taste of the polynomial representation:
type family Code (a :: *) :: [[*]]
data Expr = Num Int | Add {left :: Expr, right :: Expr}
type instance Code Expr = '[ '[Int], '[Expr, Expr] ]
一個 Generic 程式可以...
- 用 from 得到 Generic 表示法
- 對 Generic 結構做操作
- 用 to 轉回 Datatype
Scary Codes
只要是 Generic 的 instance (e.g. Expr)就能轉成 JSON:
Scary Codes
Scary Codes
寫 Generic Function 也太難了吧!
- 會寫
Generic
的 instance 就能有 Eq, Show, Lens, toJSON, fromJSON...- 就算你不會寫 instance
- 用別人寫好的 Library 也很好用
- 就算你不會用 Library...
- 起碼知道寫 generic function 的「感覺」
- 就算你不會寫 instance
就算你不會 DGP...
DGP in a nutshell
- 有一個能表示眾多 Datatype 的 Generic 表示法
- 有一個機制將 Native Datatype 轉成 Generic 表示法(Metaprogramming?)
- 對所有能以這個表示法表示的 Datatype 寫出 Generic Function
- 做 to/from 轉換
- (to-do) 使用 Metaprogramming 產生 function
總結 & 心得
常見的「FP 風味」設計
- 每個 type 分別定義一個 map、fold、filter...
- 不同 function 無法保證行為一致
- 不能使出組合技
Polymorphism 不夠強
pub trait Summary {
fn summarize(&self) -> String {
String::from("(Read more...)")
}
}
Functor 是「kind 為 * -> * 的 Type Constructor」
的性質:e.g. List :: * -> *, Tree :: * -> *
沒有 higher kinded type 無法完整表達!
其他補充
- Filter 怎寫?
- Foldr 怎寫?
- Deriving?
References
- Datatype-generic Programming by Jeremy Gibbons
- True Sums of Product
- Generic Programming of All Kinds
Datatype-Generic Programming for All of Us
By zekt
Datatype-Generic Programming for All of Us
- 889