And what it taught me
@robashton
I've not done Transducers yet
A decade of enterprise .NET experience
a decade of enterprise JS experience
three years of Clojure fiddling
a year of enterprise Erlang experience
Now a certified Haskell pusher
So I wrote a database in Clojure
github.com/robashton/cravendb
(demo)
Parens are your friend
An editor without Paredit is not an editor
(Vim and Emacs users rejoice o/)
(So Pretty!)
The REPL is Boss
Emacs has a gazillion attempts at this
Vim has vim-fireplace/redl
LightTable has InstaREPL
If your editor can't REPL your editor is broken
(Inside out, Bottom Up)
(defn next-world-state [world]
; Code goes here
)
"Any live cell with fewer than two live neighbours dies
as if caused by under-population."
"Any live cell with fewer than two live neighbours dies
as if caused by under-population."
(defn starves? [is-alive neighbour-count]
(and (< live-neighbours 2) is-alive))
(defn live-neighbours [cell])
(defn is-alive? [cell])
(defn dies? [cell]
(or
(starves?
(is-alive? cell) (live-neighbours cell)
(overcrowds?
(is-alive? cell) (live-neighbours cell)))
(defn is-alive? [cell]
(= (:current-state cell) :alive))
Any live cell
(defn neighbour-count [cell grid]
; whatever
)
How many neighbours?
Keep it flat (if you can)
{
indexes: [
:path "/indexes/ponies"
:type :in-memory
:lucene { :handle ... }
:pending [ { :id 2 :paths [ "/bar" "/foo" ] } ]
]
; etc
}
(update-in my-map [ :indexes 0 :path] "new-path")
(defn add-index [db-state])
What's in db-state??
(indexes/update-path new-path id indexes)
Build modules around each flat data structure in application state
Let somebody else do the hard work
lein deps
maven
(no XML though!)
(defproject cravendb "0.1.0-SNAPSHOT"
:description "A clojure-oriented document-oriented database"
:url "http://robashton.github.io/cravendb"
:min-lein-version "2.2.0"
:dependencies [[org.clojure/clojure "1.5.1"]
; etc]
(run-server my-handler { :port 8080 })
Defining handlers for http-kit
It's all about the resources
It's all about http correctness
(ANY "/document/:id" [id]
(resource
:allowed-methods [:put :get :delete :head]
:etag (fn [ctx] (etag-from-metadata ctx))
:put! (fn [ctx] (db/put-document instance id (read-body ctx)))
:delete! (fn [_] (db/delete-document instance id))
:handle-ok (fn [_] (db/load-document instance id))))
Text
:dependencies [[org.clojure/clojure "1.5.1"]
[org.clojure/core.async "0.1.256.0-1bf8cf-alpha"]
[ring/ring-core "1.1.7"]
[org.clojure/data.csv "0.1.2"] ;; For load purposes
[com.cemerick/url "0.1.0"]
[liberator "0.9.0"]
[instaparse "1.2.2"]
[http-kit "2.1.12"]
[compojure "1.1.5"]
[serializable-fn "1.1.3"]
[clojurewerkz/vclock "1.0.0"]
[clj-time "0.6.0"]
[org.fusesource.leveldbjni/leveldbjni-all "1.7"]
[me.raynes/fs "1.4.4"]
[http.async.client "0.5.2"]
[org.clojure/tools.logging "0.2.6"]
[org.slf4j/slf4j-log4j12 "1.6.6"]
[org.clojure/core.incubator "0.1.3"]
[org.apache.lucene/lucene-core "4.4.0"]
[org.apache.lucene/lucene-queryparser "4.4.0"]
[org.apache.lucene/lucene-analyzers-common "4.4.0"]
[org.clojure/data.codec "0.1.0"]
[org.apache.lucene/lucene-highlighter "4.4.0"]
Interop with legacy Java is GREAT
But Java sucks (so does Scala, before any of you get started)
(Classic Java, one of the best indexing systems around)
[org.apache.lucene/lucene-core "4.4.0"]
[org.apache.lucene/lucene-queryparser "4.4.0"]
[org.apache.lucene/lucene-analyzers-common "4.4.0"]
(Lol @ Java namespaces)
(:import
(org.apache.lucene.analysis.standard StandardAnalyzer)
(org.apache.lucene.store FSDirectory RAMDirectory)
(org.apache.lucene.util Version)
(org.apache.lucene.index IndexWriterConfig IndexWriter DirectoryReader)
(org.apache.lucene.search IndexSearcher Sort SortField SortField$Type)
(org.apache.lucene.queryparser.classic QueryParser)
(org.apache.lucene.document Document Field Field$Store Field$Index
TextField IntField FloatField StringField)))
; Create a RAM directory called 'dir'
(def dir (RAMDirectory.))
; Create an index writer over that dir
(def writer (IndexWriter. dir))
; Create an index reader over that dir
(def reader (IndexReader. dir))
; Query that reader
(IndexQuery reader "*")
Interop with legacy Java SUCKS
Classes + Interfaces + FactoryFactoryProvider
vs
Maps, Vectors, Lists
(defn create-index [file]
(let [analyzer (StandardAnalyzer. Version/LUCENE_CURRENT)
directory (FSDirectory/open file)
config (IndexWriterConfig. Version/LUCENE_CURRENT analyzer) ]
(LuceneIndex. analyzer directory config)))
Convert *everything* into maps and lists
(index-result-to-map [index-result]
{
:name (.getName index)
:total-count (.getTotalCount index)
:items (map index-item-to-map (.getItems index))
})
;; Naive implementation
(get-results [count skip index query]
(let [
real-count-to-request (+ count skip)
results (lucene/query index query real-count-to-request)
still-needed-count (- count (length results))]
(if (> still-needed-count 0)
(flatten results
(get-results still-needed-count real-count-to-request index query)))))
(defn lucene-producer [tx reader opts]
(fn [offset amount]
(->>
(lucene/query reader
(:filter opts)
(+ offset amount)
(:sort-by opts)
(:sort-order opts))
(drop offset)
(valid-documents tx))))
A producer function
(defn lucene-page
([producer page-size] (lucene-page producer 0 page-size))
([producer current-offset page-size]
{
:results (producer current-offset page-size)
:next (fn [] (lucene-page producer (+ current-offset page-size) page-size))
}))
State per page
And a recursive generator function
(defn lucene-seq
([page] (lucene-seq page (:results page)))
([page src]
(cond
(empty? (:results page)) ()
(empty? src) (lucene-seq ((:next page)))
:else (cons (first src) (lazy-seq (lucene-seq page (rest src)))))))
Native resources are a pain
(let [handle (open-file "foo.txt")]
(map to-user (read-lines handle)))
(def results
(let [handle (open-file "foo.txt")]
(map to-user (read-lines handle))))
(println results) ; CRASH
(Haskell doesn't have this problem)
How do you build an API around this?
(get-all-the-lines-from "foo.txt")
Well now it's not lazy....
(with-open [handle (open-resource "foo.txt")]
(do-stuff-with-resource))
Concurrency is something you still need to be aware of
Databases have multiple clients
HTTP GET
HTTP PUT
HTTP POST
HTTP GET
HTTP POST
Shared state
(A collection of in-memory indexes for example)
; Atom called x with value of 1
(atom x 1)
(println x) ; Atom called 'x' value of 1
(println @x) ; de-reference atom, get 1
; Increase whatever is in x by '1'
(swap! x inc)
That's a bloody mess
Channels and Processes (CSP)
(defn event-loop [initial-state input]
(go
(loop [state initial-state]
(if-let [event (<! input)]
(recur (dispatch-event state))))
(Should have used Erlang)
(go
(loop [state (initial-state engine)]
(if-let [{:keys [cmd data]} (<! command-channel)]
(do
(debug "handling index loop command" cmd)
(recur (case cmd
:schedule-indexing (main-indexing-process state)
:notify-finished-indexing (main-indexing-process-ended state)
:removed-index state ;; WUH OH
:new-index (add-chaser state data)
:chaser-finished (finish-chaser state data)
:storage-request (storage-request state data))))
(do
(debug "being asked to shut down")
(wait-for-main-indexing state)
(wait-for-chasers state)
(close-open-indexes state))))))
handle_info(schedule_indexing, State) ->
handle_info(finished_indexing, State ->
handle_info(removed_index, State) ->
Spyscope, polymorphism, records and transparent state
If we have time.
Share and Learn