Evaluation of Key-Value Stores for Distributed Locking Purposes

Piotr Grzesik

dr hab. inż. Dariusz Mrozek

Agenda

  1. Why do we need distributed locks ?
  2. Which properties are important while evaluating distributed locking mechanisms ?
  3. Selected key-value stores
  4. Related works
  5. Simulation environment
  6. Implementation details for selected solutions
  7. Performance experiments
  8. Results and concluding remarks

Distributed locks

Distributed locks are used to ensure that certain resource is used by only one process at given point in time, where that resource can be a database, email server or a mechanical appliance.

Distributed locks

  • Safety - property that satisfies the requirement of mutual exclusion, ensuring that only one client can acquire and hold the lock at any given moment

  • Deadlock-free - property that ensures that, eventually, it will be possible to acquire and hold a lock, even if the client currently holding the lock becomes unavailable

  • Fault tolerance - property ensuring that as long as the majority of the nodes of the underlying distributed locking mechanism are available, it is possible to acquire, hold, and release locks

Selected Key-Value stores

  • etcd - open source, distributed key-value store written in Go, currently developed under Cloud Native Computing Foundation. Uses Raft as underlying consensus algorithm.
  • Consul - open source, distributed key-value store written in Go, developed by Hashicorp. Uses Raft as underlying consensus algorithm.
  • Zookeeper - open source, highly available coordination system, written in Java, initially developed by Yahoo, currently maintained by Apache Software Foundation. Uses Zab atomic broardcast protocol.
  • Redis - open source, in-memory key-value database, that offers optional durability. It was developed by Salvatore Sanfilippo and is written in ANSI C. Redlock algorithm is used with Redis as a way to implement distributed locking.

Related works

  • In his research, Gyu-Ho Lee compared key write performance of Zookeeper, etcd and Consul with regards to disk bandwidth, network traffic, CPU utilization as well as memory usage

  • Patrick Hunt, in his analysis, evaluated Zookeeper server latency under varying loads and configurations, observing that for standalone server, adding additional CPU cores did not provide significant performance gains. He also observed that in general, Zookeeper is able to handle more operations per second with a higher number of clients (around 4 times more operations per second between 1 and 10 clients)

Related works

  • The liveness and safety properties of Raft were presented in paper by Diego Ongaro, creator of Raft consensus algorithm. The same properties of Zab, algorithm underpinning Zookeeper, were presented by Junqueira, Reed and Serafini.

  • Redlock algorithm was developed and described by Salvatore Sanfilippo, however safety property of that algorithm was later disputed by Martin Kleppmann, which presented in his article, that under certain conditions, the safety property of Redlock might not hold.

  • Kyle Kingsbury in his works evaluated etcd, Consul and Zookeeper with Jepsen, the framework for distributed system verification. During that process, it was revealed that Consul and etcd experienced stale reads. This research prompted maintainers of both etcd and Consul to provide mechanisms that enforce consistent reads.

Simulation environment

  • Infrastructure deployed in Amazon Web Services cloud offering

  • All solutions were developed as 3-node clusters, with one additional machine to act as client process

  • Each cluster node was deployed in different availability zone in the same geographical region, eu-central-1, to ensure fault tolerance of a single availability zone, while maintaining latency between instances in sub-milliseconds range

  • Client application used for simulations was developed in Python 3.7, that was able to acquire lock, simulate a short computation and release the lock and measure the time taken to acquire lock

Consul, etcd and Zookeeper configuration

Redis with Redlock configuration

etcd implementation details

  • etcd in version 3.3.9 was used, compiled with Go version 1.10.3

  • Cluster was configured to ensure sequential consistency model, to satisfy safety property

  • Client application was developed to use python-etcd3 library, which implements locks using the atomic compare-and-swap mechanism

  • To ensure deadlock-free property, TTL for a lock is set, which releases the lock if the lock-holding client crashes or is network partitioned away

  • Underlying Raft consensus algorithm ensures that fault tolerance property is satisfied

Consul implementation details

  • Consul in version 1.4.0 was used, compiled with Go version 1.11.1

  • Cluster was configured with "consistent" consistency mode, to satisfy safety property

  • Client application was developed to use Python consul-lock library, which implements locks by using sessions mechanism with the check-and-set operation

  • According to the Consul documentation, depending on the selected health-checking mechanism of sessions, either safety or liveness property might be sacrificed. Selected implementation uses only session TTL health check, which guarantees that both properties hold

  • TTL is applied at the session level and session timeout either deletes or releases all locks related to the session

  • Underlying Raft consensus algorithm ensures that fault tolerance property is satisfied

Zookeeper implementation details

  • Zookeeper in version 3.4.13 was used, running on Java 1.7, with Open JDK Runtime Environment

  • Client application was developed to use Python kazoo library, which offers lock recipe, taking advantage of Zookeeper's ephemeral and sequential znodes

  • On lock acquire, a client creates znode with ephemeral and sequential flag and checks if sequence number is the lowest - if true - acquires lock, otherwise can watch the znode and be notified when the lock is released

  • Usage of ephemeral nodes ensures deadlock-free property

  • Underlying Zab algorithm ensures that fault tolerance and safety properties are satisfied

Redis with Redlock

implementation details

  • Redis in version 5.0.3. was used

  • Client application was developed to use Python redlock library

  • Redlock algorithm works by trying to sequentially acquire lock on all independent Redis instances

  • Lock is acquired if it was successfully acquired on majority of the nodes

  • In most cases, majority requirements ensures safety property

  • In specific circumstances safety property can be lost

  • Deadlock-free property is satisfied by using lock timeouts

  • Fault tolerance is satisfied, tolerates up to (N/2) - 1 failures of independent Redis nodes 

Performance experiments

  • Simulation for a workload where all processes try to acquire different lock keys

  • Simulation for a workload where all processes try to acquire the same lock key

  • Both simulations were performed for 1, 3 and 5 clients

Summary of results for different keys simulation

Summary of results for

same keys simulation

Results summary

  • Zookeeper, Consul and etcd show very little difference in performance between both workloads

  • Redis offers stable and the best performance in case of clients accessing different lock keys, while becoming unstable for workload involving multiple clients contesting over the same key, having the 99th percentile of lock acquire time for 5 processes as high as 203.87 ms, over 10 times more than other tested solutions

  • Considering both workloads, the most performing solution is Zookeeper, which in worst case scenario (5 processes, different lock keys) offers average lock time of 5.94 ms, with the maximum of 19.98 ms, the 90th percentile of 7.4 and the 99th percentile of 10.17 ms.

  • If the upper lock time bound is not essential for given application, then Redis is the best performing solution, which offers 90th percentile of lock acquire time of 1.3 ms for 5 concurrent processes

Bibliography

  • https://www.consul.io/docs/index.html

  • https://etcd.readthedocs.io/en/latest/

  • Hunt, P.: Zookeeper service latencies under various loads and configurations, https://cwiki.apache.org/confluence/display/ZOOKEEPER/ServiceLatencyOverview

  • Junqueira, F., Reed, B., Hunt, P., Konar, M.: Zookeeper: Wait-free coordination for internet-scale systems. In: Proceedings of the 2010 USENIX conference on USENIX Annual Technical Conference (June 2010)

  • Junqueira, F., Reed, B., Serafini, M.: Zab: High-performance broadcast for
    primary-backup systems. In: IEEE/IFIP 41st International Conference on Depend-
    able Systems and Networks (DSN). pp. 245–256 (June 2011)

  • Kingsbury, K.: Jepsen: etcd and Consul (accessed on January 9th, 2019), https://aphyr.com/posts/316-call-me-maybe-etcd-and-consul

Bibliography

  • Kingsbury, K.: Jepsen: Zookeeper https://aphyr.com/posts/291-call-me-maybe-zookeeper

  • Kleppmann, M.: How to do distributed locking,
    http://martin.kleppmann.com/2016/02/08/how-to-do-distributed-locking.
    html

  • Lee, G.H.: Exploring performance of etcd, zookeeper and consul consistent key-value datastores, https://coreos.com/blog/performance-of-etcd.html

  • Ongaro, D., Ousterhout, J.: In search of an understandable consensus algorithm.
    In: Proceedings of the 2014 USENIX conference on USENIX Annual Technical
    Conference. pp. 305–320 (June 2014)

  • Distributed locks with Redis, https://redis.io/topics/distlock

Made with Slides.com