Building a database in clojure
The stuff it taught me
( https://github.com/robashton/cravendb )
Assumptions
You have played with Clojure a little
You're unafraid of parens
Building a database in clojure
The stuff it taught me
A bit of history (me)
A decade of enterprise .NET experience
a decade of enterprise JS experience
three years of Clojure fiddling
a year of enterprise Erlang experience
First off
You're doing JS wrong
You're doing C# wrong
No time to explain why - this is a talk about Clojuerfa
I hated functional
programming at university
Functional programming
is important
(apparently - sigh)
"Teach me FP oh wise one"
So I wrote space invaders in clojurescript
a dozen times
It was (too) hard
I needed something easier
So I wrote a database in Clojure
What is clojure?
Functional
Dynamic
(Lisp)
Runs on the JVM
Very quick run through
Types
Lists ()
Vectors []
Maps {}
Functions ([])
Strings and ints
"Hello World"
1337
Symbols
:hello
:bob
(Just think of them as interned strings)
Lists
; A list of numbers
(1,2,3,4,5)
; A list of strings
("hello" "world")
; A list of 'stuff'
(1, "hello", {}, :bob)
vectors
; A vector of numbers
[1,2,3,4,5,6,7]
; A vector of strings ["hello", "world"]
; A vector of 'stuff'
[1, "hello", {}, :bob]
The Difference??
One uses square brackets
The other uses parens
(They have different performance
characteristics but let's not go into that)
Maps
{
:name "Rob Ashton"
:age 1337
:beard :awesome
}
definitions
(def foo 5)
Functions
(fn [] "hello)
Functions
(def foo (fn [] "hello"))
; Call 'foo' (foo)
Functions
(defn foo [] "hello")
; Call 'foo' (foo)
got it?
Let's proceed.
Lesson #1 - Parens are your friend
(no (really (they (are (not (that (bad)))))))))
Paredit
plug-in for vim
plug-in for emacs
If your editor doesn't have Paredit
your editor is wrong
rainbow braces
All the colours of the rainbow in your text editor
Quick demo of paredit in vim
Lesson #2 - The repl is king
Repl integration
vim has vim-redl/fireplace
emacs has cider/etc
Light table has its insta-repl
Demo (REPL)
lesson #3 - Inside out, bottom up
Conway's game of life
-
Any live cell with fewer than two live neighbours dies, as if caused by under-population.
-
Any live cell with two or three live neighbours lives on to the next generation.
-
Any live cell with more than three live neighbours dies, as if by overcrowding.
-
Any dead cell with exactly three live neighbours becomes a live cell, as if by reproduction.
"Write me the function that gives me the next world state"
(defn next-world-state [world]
; Code goes here
)
No no no no no
"Any live cell with fewer than two live neighbours dies, as if caused by under-population."
Break it down
"Any live cell with fewer than two live neighbours dies, as if caused by under-population."
Break it down
(defn starves? [is-alive, live-neighbours]
(and (< live-neighbours 2) is-alive)
only pass in the data you need
Where did is-alive come from?
Where did the number of neighbours come from?
Composability
(defn live-neighbours [cell])
(defn is-alive? [cell]) (defn dies? [cell]
(or
(starves? (is-alive? cell) (live-neighbours cell)
(overcrowds?
(is-alive? cell) (live-neighbours cell)))
Lots of small expressions
You can reason about them
then compose them
then win
Lesson #4 - Keep it flat
You can build arbitrary complex structures
{
:path "/db"
:indexes [ {
id: "2"
name: "My index"
pending: [
{ id: "2"
paths: [ "/"
You can update arbitrary structures
(update-in my-map [ :indexes 0 :path] "new-path")
But passing it around is asking for trouble
(defn add-index [db-state])
WHAT IS IN THAT MAP??
only take what you need
(defn add-index [indexes])
Lesson #5 -
Let somebody else do the hard work
Clojars
https://clojars.org/
Lein deps
Built up on Maven :'(
Don't worry - no XML
Project.clj
(defproject cravendb "0.1.0-SNAPSHOT"
:description "A clojure-oriented document-oriented database"
:url "http://robashton.github.io/cravendb"
:min-lein-version "2.2.0"
:dependencies [[org.clojure/clojure "1.5.1"]
; etc]
Package - Http-kit
(run-server my-handler { :port 8080 })
Voila, http server running!
Package - liberator
Defining handlers for http-kit
Resource based
"Proper HTTP"
My document routes
(ANY "/document/:id" [id]
(resource
:allowed-methods [:put :get :delete :head]
:etag (fn [ctx] (etag-from-metadata ctx))
:put! (fn [ctx] (db/put-document instance id (read-body ctx)))
:delete! (fn [_] (db/delete-document instance id))
:handle-ok (fn [_] (db/load-document instance id))))
My dependencies (etc..)
[[org.clojure/clojure "1.5.1"]
[org.clojure/core.async "0.1.256.0-1bf8cf-alpha"]
[ring/ring-core "1.1.7"]
[org.clojure/data.csv "0.1.2"] ;; For load purposes
[com.cemerick/url "0.1.0"]
[liberator "0.9.0"]
[instaparse "1.2.2"]
[http-kit "2.1.12"]
[compojure "1.1.5"]
[serializable-fn "1.1.3"]
[clojurewerkz/vclock "1.0.0"]
[clj-time "0.6.0"]
[org.fusesource.leveldbjni/leveldbjni-all "1.7"]
[me.raynes/fs "1.4.4"]
[http.async.client "0.5.2"]
[org.clojure/tools.logging "0.2.6"]
[org.slf4j/slf4j-log4j12 "1.6.6"]
[org.clojure/core.incubator "0.1.3"]
[org.apache.lucene/lucene-core "4.4.0"]
Lesson #6 -
Interop with legacy java is great
(in fact it's pretty much Clojure's selling point)
There is a lot of oss in java
But java sucks
Use Clojure!
Lucene
Classic Java
Searching/Indexing/Performance
can simply add as a dependency
[org.apache.lucene/lucene-core "4.4.0"]
[org.apache.lucene/lucene-queryparser "4.4.0"]
[org.apache.lucene/lucene-analyzers-common "4.4.0"]
Import it
(:import
(org.apache.lucene.analysis.standard StandardAnalyzer)
(org.apache.lucene.store FSDirectory RAMDirectory)
(org.apache.lucene.util Version)
(org.apache.lucene.index IndexWriterConfig IndexWriter DirectoryReader)
(org.apache.lucene.search IndexSearcher Sort SortField SortField$Type)
(org.apache.lucene.queryparser.classic QueryParser)
(org.apache.lucene.document Document Field Field$Store Field$Index
TextField IntField FloatField StringField)))
Use it
; Create a RAM directory called 'dir'
(def dir (RAMDirectory.))
; Create an index writer over that dir
(def writer (IndexWriter. dir))
; Create an index reader over that dir
(def reader (IndexReader. dir))
; Query that reader
(IndexQuery reader "*")
Lesson #7 -
Interop with legacy java hurts
Classes, interfaces
and factory factories
vs
Maps, Lists, Vectors
JAva in Clojure
(defn create-index [file]
(let [analyzer (StandardAnalyzer. Version/LUCENE_CURRENT)
directory (FSDirectory/open file)
config (IndexWriterConfig. Version/LUCENE_CURRENT analyzer) ]
(LuceneIndex. analyzer directory config)))
It's pervasive
- Lucene abstracts to "objects"
- Clojure abstracts "data"
- Passing opaque "objects" around !== good Clojure
Solution
Hide APIs behind lists and maps
- Convert horrible data into plain old maps
- Convert Paging API access into recursive list generators
Example - querying
(defn lucene-seq
([page] (lucene-seq page (:results page)))
([page src]
(cond
(empty? (:results page)) ()
(empty? src) (lucene-seq ((:next page)))
:else (cons (first src) (lazy-seq (lucene-seq page (rest src)))))))
Lesson #8 -
Resources are a bitch
What's wrong with this?
(let [handle (open-file "foo.txt")]
(map to-user (read-lines handle)))
Clojure is Lazy
(def results
(let [handle (open-file "foo.txt")]
(map to-user (read-lines handle))))
(println results) ; CRASH
How do you hide this behind an API?
(get-all-the-lines-from "foo.txt")
Er... but now it's not lazy - bye bye memory
Don't try to hide resources
(with-open [handle (open-resource "foo.txt")]
(do-stuff-with-resource))
Resources
- Don't try to hide resource usage from end-user
- Give them an 'open' method
- Give them an API to operate over that resource
- Make them responsible for closing it
- Deal with it.
Lesson #9 -
Core.async solves problems
The problem
Several clients at once
HTTP GET
HTTP POST
HTTP POST
HTTP PUT
The problem
We keep data in memory for performance
Let's call this a collection of "indexes"
The problem
Clojure gives us constructs for managing access to shared state
; Atom called x with value of 1
(atom x 1)
(println x) ; Atom called 'x' value of 1
(println @x) ; 1
; Increase whatever is in x by '1'
(swap! x inc)
Solution #1
- Collections of atoms (or agents)
- Keep them behind an interface
- Coordinate all access to them
Problem?
It was a bloody mess.
Core.async
Channels and Processes (CSP)
AN EVENT loop
(defn event-loop [initial-state input]
(go
(loop [state initial-state]
(if-let [event (<! input)]
(recur (dispatch-event state))))
Event loops
-
Can look after some private local state
-
Can look after a collection of resources
-
Can coordinate multi-threaded access over that state
-
Look a lot like actors
lesson #10 -
Should have used erlang
I ended up with clojure actors
(go
(loop [state (initial-state engine)]
(if-let [{:keys [cmd data]} (<! command-channel)]
(do
(debug "handling index loop command" cmd)
(recur (case cmd
:schedule-indexing (main-indexing-process state)
:notify-finished-indexing (main-indexing-process-ended state)
:removed-index state ;; WUH OH
:new-index (add-chaser state data)
:chaser-finished (finish-chaser state data)
:storage-request (storage-request state data))))
(do
(debug "being asked to shut down")
(wait-for-main-indexing state)
(wait-for-chasers state)
(close-open-indexes state))))))
They look like OTP actors
handle_info(schedule_indexing, State) ->
handle_info(finished_indexing, State ->
handle_info(removed_index, State) ->
But haS
-
None of the guarantees
-
None of the ops tooling
Oops.
All the native interop with leveldb
Should have just been native code embedded in Erlang
Lesson #0 -
share and learn
An introduction to Clojure via the medium of my database experiment
By Rob Ashton
An introduction to Clojure via the medium of my database experiment
This is not my building a database in clojure talk, this is a simple version I did for a non-clojure audience :)
- 2,423