How to write a search engine in 15 lines of code
Paul Chiusano
@pchiusano
(an introduction to Unison)
@unisonweb
Programming: FUN!!!1!!
A troubling observation...
vast computing resources of civilization
A single OS process
The gap
Docker
Kubernetes
Terraform
Kafka
DynamoDB
S3
EC2
ElasticSearch
Kibana
Prometheus
Grafana
PagerDuty
etcd
ELB
Route 53
Consul
iptables
systemd
Flannel
Weave
Lambda
App Engine
rkt
CoreOS
Zookeeper
Redis
memcached
Initechinize
(okay, that one I made up)
Protobufs
Thrift
A better model
factorial : Number -> Number
factorial n =
Vector.fold-left (*) 1 (Vector.range 1 (n + 1))
-- Evaluate factorial at another node
factorial-at : Node -> Number -> Remote Number
factorial-at alice n =
do Remote
Remote.transfer alice
pure (factorial n)
-- apply a function f to two arguments
f x y
f (x + 1) y
-- type signature
sort : forall a . Order a -> Vector a -> Vector a
.
.
.
.
.
.
.
.
.
.
.
.
.
-- Create an empty Index
Index.empty : forall k v . Remote (Index k v)
-- Insert a key value pair into the index
-- can use '∀' instead of 'forall'
Index.insert : ∀ k v . k -> v -> Index k v -> Remote Unit
-- Lookup a key in an index. May return None
Index.lookup : ∀ k v . k -> Index k v -> Remote (Optional v)
Persistent key-value storage
-- There's just a single value of type Unit, Unit!
Unit : Unit
-- Optional
Some 42 : Optional Number
None : Optional Number
.
.
index-example : Node -> Node -> Remote Text
index-example alice bob = do Remote
Remote.transfer alice
ind := Index.empty -- create the index on alice
Index.insert "Alice" "Jones" ind
Index.insert "Bob" "Smith" ind
Remote.transfer bob
Index.lookup "Alice" ind
Key-value storage usage
▶
▶
.
.
▶
.
A search engine in 15 lines of code
A search index
Keyword | Set of urls containing the keyword |
---|---|
programming | {haskell.org, lambda-the-ultimate.org, unisonweb.org ...} |
unison | {2016.fullstackfest.com/speakers, unisonweb.org, ... } |
scala | {scala-lang.org, scala.epfl.ch, ... } |
2016 olympics | {olympic.org/rio-2016, ... } |
... |
search for:
"unison programming"
Keyword | Set of urls containing the keyword |
---|---|
programming | {haskell.org, lambda-the-ultimate.org, unisonweb.org ...} |
unison | {2016.fullstackfest.com/speakers, unisonweb.org, ... } |
alias Url = Text
alias Keyword = Text
alias Set v = Index v Unit
alias SearchIndex = DIndex Keyword (Set Url)
search : Number -> Vector Keyword -> SearchIndex
-> Remote (Vector Url)
search limit query ind = do Remote
url-sets := Remote.traverse (k -> DIndex.lookup k ind) query
zero = IndexedTraversal.empty
url-sets := Remote.map (Optional.fold zero Index.traversal) url-sets
merge = IndexedTraversal.intersect (Order.by-2nd Hash.Order)
urls? = Vector.fold-balanced1 merge url-sets
-- urls : Vector (Url, Hash Url)
urls := IndexedTraversal.take-keys limit (Optional.get-or zero urls)
pure (Vector.map 1st urls)
▶
▶
▶
Text
Text
Text
.
.
-- Pick the nodes responsible for a key, using rendezvous hashing
DIndex.nodes-for-key : ∀ k v . k -> DIndex k v -> Remote (Vector Node)
DIndex.nodes-for-key k ind = do Remote
nodes := Index.keys ind
hashes := Remote.traverse (node -> hash! (node, k)) nodes
(nodes `Vector.zip` hashes)
|> Vector.sort Hash.Order 2nd
|> Vector.take DIndex.Replication-Factor
|> Vector.map 1st
|> pure
alias DIndex k v = Index Node (Index k v)
For key "Alice", cluster: node1, node2, node3
hash (node1, "Alice"), hash (node2, "Alice") ...
choose node(s) whose hash value highest
.
.
Remote.spawn : Remote Node
-- spawn a node, transfer control there
-- then continue computation
do Remote
n := Remote.spawn
Remote.transfer n
...
Creating nodes
us-east : Node
eu-central : Node
...
Remote.spawn : Remote Node
Remote.spawn-at : Node -> Remote Node
-- Create 10,000 nodes and add them to a DIndex cluster
do Remote
ind := DIndex.empty
-- could also spawn at eu-central, or both regions!
cluster := Remote.replicate 10000 (Remote.spawn-at us-east)
Remote.traverse (n -> Remote.at' n (DIndex.join ind)) cluster
...
Creating nodes (cont)
.
.
.
.
.
.
How???
factorial-at alice n =
do Remote
Remote.transfer alice
pure (factorial n)
.
factorial n =
Vector.fold-left (*) 1 (Vector.range 1 (n + 1))
blah z =
Vector.fold-left (*) 1 (Vector.range 1 (z + 1))
Using hashes for identity
#Q82jfkasdf823jbc192
factorial-at alice n =
do Remote
Remote.transfer alice
pure (factorial n)
factorial-at alice n =
do Remote
Remote.transfer alice
pure (#Q82jfkasdf823jbc192 n)
.
.
.
.
.
.
.
.
factorial n =
Vector.fold-left (*) 1 (Vector.range 1 (n + 1))
Implications: an immutable codebase
#Q82jfkasdf823jbc192
factorial-at alice n =
do Remote
Remote.transfer alice
pure (#Q82jfkasdf823jbc192 n)
#zzzzzyyyl8as9dfasdl
factorial n = 43
.
.
unisonweb.org
Contributors / advisors: Dan Doel, Sam Griffin, Ed Kmett, Arya Irani, Michael Pilquist ...
@unisonweb
Questions?
alias Html = Text
Http.get-url : Url -> Remote (Either Text Html)
Html.get-links : Html -> Vector Html.Link
Html.plain-textify : Html -> Text
Text.words : Text -> Vector Text
Web.crawl : Vector Url -> DIndex Keyword (Set Url) -> Remote Unit
A1: crawler, IndexedTraversal
alias IndexedTraversal k v =
( Remote (Optional k) -- first key
, k -> Remote (Optional v) -- lookup
, k -> Remote (Optional k)); -- next valid key
A2: Fast streaming intersection
[ 1 2 3 16 45 48 65 100 109]
[ 1 13 14 109]
[ 2 3 16 45 48 65 100 109]
[ 13 14 109]
[ 1 ]
[2 316 45 48 65 100 109]
[ 13 14 109]
[ 1 ]
[ 16 45 48 65 100 109]
[ 13 14 109]
[ 1 ]
[ 16 45 48 65 100 109]
[13 14109]
[ 1 ]
[ 16 45 48 65 100 109]
[ 109 ]
[ 1 ]
[16 45 48 65 100109]
[ 109 ]
[ 1 ]
[ 109 ]
[ 109 ]
[ 1 ]
= [ 1 109 ]
A3: A better DIndex
- minimize # hashes per lookup
- replicate based on demand for key
- more advanced load-balancing
- decentralize cluster state
- Paxos / Raft as a Unison lib
A4: Eliminating diamond dependency problem
- Depend on two libs, alice, and bob
- alice library depends on carol-v1
- bob library depends on carol-v2
- problem 95% artificial
- alice depends on carol-v1 only for 'factorial', bob depends on carol-v2 for 'quicksort' - NO CONFLICT!!
A4(a): Eliminating diamond dependency problem
- alice library depends on carol-v1
- bob library depends on carol-v2
- alice depends on carol-v1 only for 'factorial', bob depends on carol-v2 for (improved) 'factorial'
- why can't we allow both versions to be used??
FSF 2016: How to write a search engine in 15 lines of code (an introduction to Unison)
By Paul Chiusano
FSF 2016: How to write a search engine in 15 lines of code (an introduction to Unison)
- 2,394