Building a database in clojure


The stuff it taught me


( https://github.com/robashton/cravendb )

Assumptions


You have played with Clojure a little
You're unafraid of parens

Building a database in clojure


The stuff it taught me 

A bit of history (me)


A decade of enterprise .NET experience
a decade of enterprise JS experience
three years of Clojure fiddling
a year of enterprise Erlang experience

First off


You're doing JS wrong
You're doing C# wrong
No time to explain why - this is a talk about Clojuerfa

I hated functional 

programming at university

Functional programming 

is important



(apparently - sigh)

"Teach me FP oh wise one"




So I wrote space invaders in clojurescript



a dozen times

It was (too) hard

I needed something easier




So I wrote a database in Clojure

What is clojure?



Functional
Dynamic
(Lisp)
Runs on the JVM

Very quick run through


Types
Lists ()
Vectors []
Maps {}
Functions ([])

Strings and ints



"Hello World"


1337 

Symbols



:hello
:bob 

(Just think of them as interned strings)

Lists


; A list of numbers(1,2,3,4,5)

; A list of strings("hello" "world")
; A list of 'stuff'
(1, "hello", {}, :bob)

vectors


; A vector of numbers[1,2,3,4,5,6,7]

; A vector of strings ["hello", "world"]
; A vector of 'stuff'[1, "hello", {}, :bob]

The Difference??



One uses square brackets
The other uses parens

(They have different performance 
characteristics but let's not go into that)

Maps

{
  :name "Rob Ashton"
  :age 1337
  :beard :awesome
}






 

definitions



(def foo 5) 

Functions



(fn [] "hello) 

Functions



(def foo (fn [] "hello"))

; Call 'foo' (foo)

Functions


(defn foo [] "hello")

; Call 'foo' (foo)

got it?





Let's proceed.

Lesson #1 - Parens are your friend



(no (really (they (are (not (that (bad)))))))))

Paredit


plug-in for vim
plug-in for emacs

If your editor doesn't have Paredit
your editor is wrong

rainbow braces



All the colours of the rainbow in your text editor

Quick demo of paredit in vim

Lesson #2 - The repl is king

Repl integration



vim has vim-redl/fireplace
emacs has cider/etc
Light table has its insta-repl

Demo (REPL)

lesson #3 - Inside out, bottom up

Conway's game of life

  • Any live cell with fewer than two live neighbours dies, as if caused by under-population.
  • Any live cell with two or three live neighbours lives on to the next generation.
  • Any live cell with more than three live neighbours dies, as if by overcrowding.
  • Any dead cell with exactly three live neighbours becomes a live cell, as if by reproduction.



"Write me the function that gives me the next world state"



 (defn next-world-state [world]
     ; Code goes here
  )

No no no no no



"Any live cell with fewer than two live neighbours dies, as if caused by under-population."

Break it down




"Any live cell with fewer than two live neighbours dies, as if caused by under-population."

Break it down


(defn starves? [is-alive, live-neighbours]
   (and (< live-neighbours 2) is-alive) 


only pass in the data you need


Where did is-alive come from?
Where did the number of neighbours come from?

Composability


(defn live-neighbours [cell])(defn is-alive? [cell])

(defn dies? [cell]  (or     (starves? 
     (is-alive? cell) (live-neighbours cell)    (overcrowds?     (is-alive? cell) (live-neighbours cell))) 

Lots of small expressions



You can reason about them
then compose them
then win

Lesson #4 - Keep it flat

You can build arbitrary complex structures


 {     :path "/db"     :indexes [ {                   id: "2"                   name: "My index"                  pending: [                        { id: "2"                         paths: [ "/" 

You can update arbitrary structures



(update-in my-map [ :indexes 0 :path] "new-path")

But passing it around is asking for trouble



(defn add-index [db-state])

WHAT IS IN THAT MAP??

only take what you need



(defn add-index [indexes])

Lesson #5 -

Let somebody else do the hard work



Clojars


https://clojars.org/


Lein deps



Built up on Maven :'(

Don't worry - no XML

Project.clj

(defproject cravendb "0.1.0-SNAPSHOT"
  :description "A clojure-oriented document-oriented database"
  :url "http://robashton.github.io/cravendb"
  :min-lein-version "2.2.0"
  :dependencies [[org.clojure/clojure "1.5.1"] 
                 ; etc]

Package - Http-kit



(run-server my-handler { :port 8080 })


Voila, http server running!

Package - liberator


Defining handlers for http-kit
Resource based
"Proper HTTP"

My document routes


    (ANY "/document/:id" [id]
      (resource
        :allowed-methods [:put :get :delete :head]
        :etag (fn [ctx] (etag-from-metadata ctx))
        :put! (fn [ctx] (db/put-document instance id (read-body ctx)))
        :delete! (fn [_] (db/delete-document instance id))
        :handle-ok (fn [_] (db/load-document instance id))))

My dependencies (etc..)

 [[org.clojure/clojure "1.5.1"]
                 [org.clojure/core.async "0.1.256.0-1bf8cf-alpha"]
                 [ring/ring-core "1.1.7"]
                 [org.clojure/data.csv "0.1.2"] ;; For load  purposes
                 [com.cemerick/url "0.1.0"]
                 [liberator "0.9.0"]
                 [instaparse "1.2.2"]
                 [http-kit "2.1.12"]
                 [compojure "1.1.5"]
                 [serializable-fn "1.1.3"]
                 [clojurewerkz/vclock "1.0.0"]
                 [clj-time "0.6.0"]
                 [org.fusesource.leveldbjni/leveldbjni-all "1.7"]
                 [me.raynes/fs "1.4.4"]
                 [http.async.client "0.5.2"]
                 [org.clojure/tools.logging "0.2.6"]
                 [org.slf4j/slf4j-log4j12 "1.6.6"]
                 [org.clojure/core.incubator "0.1.3"]
                 [org.apache.lucene/lucene-core "4.4.0"]

Lesson #6 - 

Interop with legacy java is great




(in fact it's pretty much Clojure's selling point)

There is a lot of oss in java



But java sucks
Use Clojure!

Lucene


Classic Java
Searching/Indexing/Performance

can simply add as a dependency


                 [org.apache.lucene/lucene-core "4.4.0"]
                 [org.apache.lucene/lucene-queryparser "4.4.0"]
                 [org.apache.lucene/lucene-analyzers-common "4.4.0"]

Import it


  (:import
           (org.apache.lucene.analysis.standard StandardAnalyzer)
           (org.apache.lucene.store FSDirectory RAMDirectory)
           (org.apache.lucene.util Version)
           (org.apache.lucene.index IndexWriterConfig IndexWriter DirectoryReader)
           (org.apache.lucene.search IndexSearcher Sort SortField SortField$Type)
           (org.apache.lucene.queryparser.classic QueryParser)
           (org.apache.lucene.document Document Field Field$Store Field$Index
                                      TextField IntField FloatField StringField))) 

Use it


; Create a RAM directory called 'dir'
(def dir (RAMDirectory.))

; Create an index writer over that dir
(def writer (IndexWriter. dir))

; Create an index reader over that dir
(def reader (IndexReader. dir))

; Query that reader
(IndexQuery reader "*")

Lesson #7 - 

Interop with legacy java hurts

Classes, interfaces

and factory factories



vs


Maps, Lists, Vectors

JAva in Clojure


(defn create-index [file]
  (let [analyzer (StandardAnalyzer. Version/LUCENE_CURRENT)
        directory (FSDirectory/open file)
        config (IndexWriterConfig. Version/LUCENE_CURRENT analyzer) ]
    (LuceneIndex. analyzer directory config))) 

 

It's pervasive


  • Lucene abstracts to "objects"
  • Clojure abstracts "data"
  • Passing opaque "objects" around  !== good Clojure

Solution


Hide APIs behind lists and maps

  • Convert horrible data into plain old maps
  • Convert Paging API access into recursive list generators

Example - querying


 (defn lucene-seq 
  ([page] (lucene-seq page (:results page)))
  ([page src]
   (cond
     (empty? (:results page)) ()
     (empty? src) (lucene-seq ((:next page)))
     :else (cons (first src) (lazy-seq (lucene-seq page (rest src)))))))

Lesson #8 - 

Resources are a bitch

What's wrong with this?


(let [handle (open-file "foo.txt")]   (map to-user (read-lines handle))) 

Clojure is Lazy


(def results  (let [handle (open-file "foo.txt")]    (map to-user (read-lines handle))))

(println results) ; CRASH

How do you hide this behind an API?



(get-all-the-lines-from "foo.txt")


Er... but now it's not lazy - bye bye memory

Don't try to hide resources



(with-open [handle (open-resource "foo.txt")]
 (do-stuff-with-resource))

Resources


  • Don't try to hide resource usage from end-user
  • Give them an 'open' method
  • Give them an API to operate over that resource
  • Make them responsible for closing it
  • Deal with it.

Lesson #9 - 

Core.async solves problems

The problem

Several clients at once

HTTP GET
HTTP POST
HTTP POST
HTTP PUT

The problem


We keep data in memory for performance

Let's call this a collection of "indexes"

The problem

Clojure gives us constructs for managing access to shared state

; Atom called x with value of 1
(atom x 1)

(println x) ; Atom called 'x' value of 1
(println @x) ; 1

; Increase whatever is in x by '1'
(swap! x inc) 

Solution #1


  1. Collections of atoms (or agents)
  2. Keep them behind an interface
  3. Coordinate all access to them

Problem?


It was a bloody mess.

Core.async


Channels and Processes (CSP)

AN EVENT loop


 (defn event-loop [initial-state input]
  (go
    (loop [state initial-state]
      (if-let [event (<! input)]
        (recur (dispatch-event state))))

Event loops


  • Can look after some private local state
  • Can look after a collection of resources
  • Can coordinate multi-threaded access over that state
  • Look a lot like actors

lesson #10 - 

Should have used erlang

I ended up with clojure actors


  (go
    (loop [state (initial-state engine)]
    (if-let [{:keys [cmd data]} (<! command-channel)]
     (do
      (debug "handling index loop command" cmd)
       (recur (case cmd
         :schedule-indexing (main-indexing-process state)
         :notify-finished-indexing (main-indexing-process-ended state)
         :removed-index state ;; WUH OH
         :new-index (add-chaser state data)
         :chaser-finished (finish-chaser state data)
         :storage-request (storage-request state data))))
      (do
        (debug "being asked to shut down")
        (wait-for-main-indexing state)
        (wait-for-chasers state)
        (close-open-indexes state))))))

They look like OTP actors


 handle_info(schedule_indexing, State) ->
 handle_info(finished_indexing, State ->
 handle_info(removed_index, State) ->

But haS


  • None of the guarantees
  • None of the ops tooling


Oops.

All the native interop with leveldb


Should have just been native code embedded in Erlang

Lesson #0 - 

share and learn


An introduction to Clojure via the medium of my database experiment

By Rob Ashton

An introduction to Clojure via the medium of my database experiment

This is not my building a database in clojure talk, this is a simple version I did for a non-clojure audience :)

  • 2,326