February 2014 

Graph Database/Golang Meetup


7pm - Building a Neo4j driver using Go's database/sql/driver interface, and the Cypher http endpoints


Wes Freeman

@wefreema

What is it? Why I wrote CQ?

  • cq is short for Cypher Queries, and it's a take off on the pq library for PostgreSQL

  • Already good Go/REST drivers out there, but I mostly wanted to just run Cypher
  • Saw Baron's talk on database/sql again, and it seemed to click
    • the row interface seemed like Cypher's tabular results
    • Cypher is very close to SQL in structure (statements, parameters)
  • Go is fun, and I thought it would be fun and a good learning experience

First, a little about Neo4j's API

  • HTTP endpoints combined with REST
  • Cypher endpoint
    • send query + parameters
    • receive array of columns, array of arrays of data
  • Streaming HTTP/JSON

Cypher EXAMPLE REQUEST

{
  "query":"MATCH (u:User)-[:FOLLOWS]->(f) 
           WHERE u.username={user}
           RETURN f.username",
  "params":{
    "user":"wefreema"
  }}

{
  "columns":["f.username"],  "data":[    ["JnBrymn"], ["RyanDay2"], ...
  ]}

For Speed: The transactional Endpoint

  • Similar to normal Cypher endpoint, but batched and transactional (can send multiple batches in the same Tx)
  • Begin a transaction
  • Send batches of Cypher statements
  • Receive batches of Cypher results
  • Rollback or commit a transaction
  • Optionally, do all of these in the same request, for a small Tx

  • Able to get 20-30k+ Cypher CREATE statements per second sustained, depending on the statement being done, with my laptop as a server, and a single-threaded cq client

Transactional Endpoint Request

{
  "statements":[
    {
      "statement":"CREATE (u:User {username:{user}})",
      "parameters":{
        "user":"wefreema"
       }
    },
    {
      "statement":"MATCH (u:User)-[:FOLLOWS]->(f) 
                   WHERE u.username={user}
                   RETURN f.username",
      "parameters":{
        "user":"wefreema"
      }
    }
  ]
}

Transactional EnDPOINT RESPONSE

{
  "commit" : "http://localhost:7474/db/data/transaction/9/commit",
  "results" : [
    {
      "columns" : [ ],
      "data" : [ { "rest": [ ] } ]
    }, 
    {
      "columns" : [ "f.username" ],
      "data" : [ { "rest" : [ { "JnBrymn", "technige" } ] } ]
    } ],
  "transaction" : {
    "expires" : "Mon, 03 Feb 2014 13:26:48 +0000"
  },
  "errors" : [ ]
}

Really, that's all you need to get 90%

  • If you're careful about your Cypher, you can get away with just those two endpoints (normal Cypher and transactional Cypher) pretty easily. 
  • And really, you can replace normal Cypher with Transactional Cypher, if you want to avoid implementing both.

  • But with only Cypher, you'll miss out on some features of the typed responses with URIs returned
    • Nodes: you'll need to make requests to get their Labels
    • Paths: you'll need to make requests to get Nodes/Rels  
    • GraphAlgorithm endpoints, etc.
    • Unmanaged extensions: hard to support in a standard way

Wishlist/Gotchas

  • I wish the streaming format were object streams (a la twitter streaming API), instead of inner arrays of objects
    • https://github.com/jexp/cypher_websocket_endpoint
  • I wish there were a format that contained all that you might care about for nodes (instead of requiring a separate request to fetch labels via REST)
  • I wish there were ways to get type metadata, like in SQL/JDBC--unfortunately, we have to guess types based on parsing, or coerce types based on expectations; not sure if this is possible in a streaming fashion, given that nodes can have the same property of different types (without more constraints)

Go's Database/sql

  • idiomatic API that wraps driver implementations
    • check out: http://go-database-sql.org/
  • supports/features:
    • simple queries, execs
    • prepared statements
    • primitive types (parameters/return values)
    • built-in connection pooling
    • transactions (more like connection-affinity, but this turned out to be perfect for Neo4j's API)
    • arbitrary connection string (driver specific)

Example Usage

db, _ := sql.Open("neo4j-cypher", "http://localhost:7474")

stmt, _ := db.Prepare(`
  match (n:User)-[:FOLLOWS]->(m:User) 
  where n.screenName = {0} 
  return m.screenName as friend
  limit 10
`)

rows, _ := stmt.Query("wefreema")

var friend string
for rows.Next() {
    rows.Scan(&friend) // error handling omitted
    log.Println(friend)
}

Database/sql/driver InterfaceS

                   http://golang.org/pkg/database/sql/driver/
Driver interface:
  Open(name string) (Conn, error)
  
Conn interface: 
  Prepare(query string) (Stmt, error)
  Close() error
  Begin() (Tx, error)
  
Stmt interface:
  Close() error
  NumInput() int
  Exec(args []Value) (Result, error)
  Query(args []Value) (Rows, error)

Rows interface:
  Columns() []string
  Close() error
  Next(dest []Value) error
  
Tx interface:
  Commit() error
  Rollback() error

Connection

  • returned from Open(connString) (Conn, error)
  • needs to keep track of the connection information
  • initially was simply baseURL, but added cypherURL and transactionURL as an optimization (they don't change once you get them)
  • added transaction to keep track of transaction state
  • added userInfo/scheme for SSL and auth support
type conn struct {
  baseURL        string             // server base URL
  userInfo       *url.Userinfo      // auth user/pass
  scheme         string             // http or https
  cypherURL      string             // url to cypher endpoint
  transactionURL string             // url to transactional endpoint
  transaction    *cypherTransaction // pointer to current transaction
}

Statements

  • created by Conn.Prepare(query) (Stmt, error)
  • need a reference to the connection to get URLs, check whether we're in a Tx, etc.
  • that's basically it! (then implement the Stmt interface)
type cypherStmt struct {
  c     *conn    // a reference back to the connection
  query *string  // the query string
}

Parameters

  • idiomatic variadic function call for parameters out of the box
    // parameter for a Stmt
    rows, err := stmt.Query("wefreema")
    
    // query, parameters (automatically creates a Stmt)
    query := "with {0} as x return x"
    rows, err := db.Query(query, 123)
  • only works for primitives (supported driver.Value types)
    • until you set up a ValueConverter to convert other supported types to driver.Value types

Rows

  • interface: Columns(), Close(), Next(dest []driver.Value)
  • all three are pretty easy to write for the Cypher/Tx endpoints
    • Columns is already an array in the JSON response
    • Close() doesn't do much (no real connection)
    • Next() needs to parse the next data array member and load the []driver.Value
      • should be relatively easy to convert to a streaming JSON parser, given how this works!

ValueConverters

  • define a way to convert from supported custom types to driver.Value supported types (currently using JSON)
ConvertValue(v interface{}) (driver.Value, error)
  • defined cq/types subpackage for custom cypher types                    
  • supports these types (along with primitive wrappers):
[]int                  -> ArrayInt             (CTCollection<CTInt>)
[]int64                -> ArrayInt64           (CTCollection<CTInt>)
[]string               -> ArrayString          (CTCollection<CTString>)
[]float64              -> ArrayFloat64         (CTCollection<CTDouble>)
[]CypherValue          -> ArrayCypherValue     (CTCollection<CTAny>)
map[string]string      -> MapStringString      (CTMap)
map[string]CypherValue -> MapStringCypherValue (CTMap)
Node                   -> Node                 (CTNode)
Relationship           -> Relationship         (CTRelationship)    

Values

  • convert from custom types to driver.Value types, in cq they get wrapped in a CypherValue and JSON'd, for example:
func (ai ArrayInt) Value() (driver.Value, error) {
  b, err := json.Marshal(CypherValue{CypherArrayInt, ai.Val})
  return b, err
}
  • all cq/types provide a Value() that wraps them in a CypherValue and turns them into []byte via JSON

Scanners

  • implement Scan for custom types, so that they can populate themselves from the value, like this:
func (af *ArrayFloat64) Scan(value interface{}) error {
  if value == nil {
    return ErrScanOnNil
  }

  switch value.(type) {
    case []float64:
      af.Val = value.([]float64)
      return nil
    case CypherValue:
      cv := value.(CypherValue)
      if cv.Type == CypherArrayFloat64 {
        af.Val = cv.Val.([]float64)
        return nil
      }
    }
    return errors.New(fmt.Sprintf("cq: invalid Scan value for %T: %T", af, value))
}

Transactions

  • needed to keep some Tx state, and a reference to the Tx, both for the Conn, and the Stmt implementations
  • cq stores batches of 100 statements/parameters
    • automatically execs those batches (sends to Neo4j)
    • keeps track of Tx expiration, sends keepalive execs whenever it hits the halfway mark

Current State of CQ

  • working pretty well!
    • decent suite of tests and benchmarks to test main functionality...
    • Tx Exec() benchmarks at 30-40µs per CREATE statement
  • plan to implement
    • Query() for Tx
    • streaming JSON parser, for higher responsiveness and overall throughput
    • further optimization (maybe move from JSON to GOB)
    • configurable Tx batch size via connection string
    • cluster host list in connection string
  • come help out! https://waffle.io/wfreeman/cq

Go GOTCHAS

    • Unable to access the values passed to the user side of the API directly... a lot of redirection from database/sql to database/sql/driver functions of the same name that made it confusing to implement, but I suppose it lends to a cleaner, more consistent user API
    • Discovered it was easy to leave connections open if you forget a Close() or if your Close() is somehow not being called at the right part of the program--or even if you don't Close() fast enough.
      • Easiest way to detect: set open file quota very low
      • Read entire buffer before Close()ing, even if you think you already did: ioutil.ReadAll() can help with this

    Building a Neo4j Driver for GO's Database/sql

    By Wes Freeman

    Building a Neo4j Driver for GO's Database/sql

    • 1,581
    Loading comments...

    More from Wes Freeman