Piotr Grzesik
dr hab. inż. Dariusz Mrozek
Distributed locks are used to ensure that certain resource is used by only one process at given point in time, where that resource can be a database, email server or a mechanical appliance.
Safety - property that satisfies the requirement of mutual exclusion, ensuring that only one client can acquire and hold the lock at any given moment
Deadlock-free - property that ensures that, eventually, it will be possible to acquire and hold a lock, even if the client currently holding the lock becomes unavailable
In his research, Gyu-Ho Lee compared key write performance of Zookeeper, etcd and Consul with regards to disk bandwidth, network traffic, CPU utilization as well as memory usage
Patrick Hunt, in his analysis, evaluated Zookeeper server latency under varying loads and configurations, observing that for standalone server, adding additional CPU cores did not provide significant performance gains. He also observed that in general, Zookeeper is able to handle more operations per second with a higher number of clients (around 4 times more operations per second between 1 and 10 clients)
The liveness and safety properties of Raft were presented in paper by Diego Ongaro, creator of Raft consensus algorithm. The same properties of Zab, algorithm underpinning Zookeeper, were presented by Junqueira, Reed and Serafini.
Redlock algorithm was developed and described by Salvatore Sanfilippo, however safety property of that algorithm was later disputed by Martin Kleppmann, which presented in his article, that under certain conditions, the safety property of Redlock might not hold.
Kyle Kingsbury in his works evaluated etcd, Consul and Zookeeper with Jepsen, the framework for distributed system verification. During that process, it was revealed that Consul and etcd experienced stale reads. This research prompted maintainers of both etcd and Consul to provide mechanisms that enforce consistent reads.
Infrastructure deployed in Amazon Web Services cloud offering
All solutions were developed as 3-node clusters, with one additional machine to act as client process
Each cluster node was deployed in different availability zone in the same geographical region, eu-central-1, to ensure fault tolerance of a single availability zone, while maintaining latency between instances in sub-milliseconds range
Client application used for simulations was developed in Python 3.7, that was able to acquire lock, simulate a short computation and release the lock and measure the time taken to acquire lock
etcd in version 3.3.9 was used, compiled with Go version 1.10.3
Cluster was configured to ensure sequential consistency model, to satisfy safety property
Client application was developed to use python-etcd3 library, which implements locks using the atomic compare-and-swap mechanism
To ensure deadlock-free property, TTL for a lock is set, which releases the lock if the lock-holding client crashes or is network partitioned away
Underlying Raft consensus algorithm ensures that fault tolerance property is satisfied
Consul in version 1.4.0 was used, compiled with Go version 1.11.1
Cluster was configured with "consistent" consistency mode, to satisfy safety property
Client application was developed to use Python consul-lock library, which implements locks by using sessions mechanism with the check-and-set operation
According to the Consul documentation, depending on the selected health-checking mechanism of sessions, either safety or liveness property might be sacrificed. Selected implementation uses only session TTL health check, which guarantees that both properties hold
TTL is applied at the session level and session timeout either deletes or releases all locks related to the session
Underlying Raft consensus algorithm ensures that fault tolerance property is satisfied
Zookeeper in version 3.4.13 was used, running on Java 1.7, with Open JDK Runtime Environment
Client application was developed to use Python kazoo library, which offers lock recipe, taking advantage of Zookeeper's ephemeral and sequential znodes
On lock acquire, a client creates znode with ephemeral and sequential flag and checks if sequence number is the lowest - if true - acquires lock, otherwise can watch the znode and be notified when the lock is released
Usage of ephemeral nodes ensures deadlock-free property
Underlying Zab algorithm ensures that fault tolerance and safety properties are satisfied
Redis in version 5.0.3. was used
Client application was developed to use Python redlock library
Redlock algorithm works by trying to sequentially acquire lock on all independent Redis instances
Lock is acquired if it was successfully acquired on majority of the nodes
In most cases, majority requirements ensures safety property
In specific circumstances safety property can be lost
Deadlock-free property is satisfied by using lock timeouts
Fault tolerance is satisfied, tolerates up to (N/2) - 1 failures of independent Redis nodes
Simulation for a workload where all processes try to acquire different lock keys
Simulation for a workload where all processes try to acquire the same lock key
Both simulations were performed for 1, 3 and 5 clients
Zookeeper, Consul and
Redis offers stable and the best performance in case of clients accessing different lock
Considering both workloads, the most performing solution is Zookeeper, which in worst case scenario (5 processes, different lock keys) offers average lock time of 5.94 ms, with the maximum of 19.98 ms, the 90th percentile of 7.4 and the 99th percentile of 10.17 ms.
If the upper lock time bound is not essential for
https://www.consul.io/docs/index.html
https://etcd.readthedocs.io/en/latest/
Hunt, P.: Zookeeper service latencies under various loads and configurations, https://cwiki.apache.org/confluence/display/ZOOKEEPER/ServiceLatencyOverview
Junqueira, F., Reed, B., Hunt, P., Konar, M.: Zookeeper: Wait-free coordination for internet-scale systems. In: Proceedings of the 2010 USENIX conference on USENIX Annual Technical Conference (June 2010)
Junqueira, F., Reed, B., Serafini, M.: Zab: High-performance broadcast for
primary-backup systems. In: IEEE/IFIP 41st International Conference on Depend-
able Systems and Networks (DSN). pp. 245–256 (June 2011)
Kingsbury, K.: Jepsen: etcd and Consul (accessed on January 9th, 2019), https://aphyr.com/posts/316-call-me-maybe-etcd-and-consul
Kingsbury, K.: Jepsen: Zookeeper https://aphyr.com/posts/291-call-me-maybe-zookeeper
Kleppmann, M.: How to do distributed locking,
http://martin.kleppmann.com/2016/02/08/how-to-do-distributed-locking.
html
Lee, G.H.: Exploring performance of etcd, zookeeper and consul consistent key-value datastores, https://coreos.com/blog/performance-of-etcd.html
Ongaro, D., Ousterhout, J.: In search of an understandable consensus algorithm.
In: Proceedings of the 2014 USENIX conference on USENIX Annual Technical
Conference. pp. 305–320 (June 2014)
Distributed locks with Redis, https://redis.io/topics/distlock