Benjamin Cane is a Distinguished Engineer at American Express. He has more than 16 years of experience with roles in both systems and software engineering. He leverages both his systems and software skills to build end-to-end platforms.
Consistency in Distributed Systems
* Warning, this is not a talk on distributed systems design patterns.
What are "Distributed Systems"
A distributed system is characterized as a collection of independent services/infrastructure that appears to the end-user as a single system.
Distributed Systems provides capabilities unavailable to Centralized Systems:
- Scale-out, adding more nodes vs. adding more CPU/Memory
- Reusability, common functionality can be a service offered to multiple platforms
- Availability, Centralized systems are a single point of failure
- Proximity, We can reduce latency for clients by deploying closer
- Cost-Effective, Distributed Systems rarely require expensive hardware solutions
Forms of Distributed Systems
Globally Distributed Platforms
Internet of Things/Edge Computing
and many more...
Distributed Systems are complex
While Distributed Systems solve many of the challenges introduced by Centralized Systems, they also have their own complexities.
Introducing the fallacies of Distributed Systems:
- The network is reliable
- Latency is zero
- Bandwidth is infinite
- The network is secure
- Topology doesn't change
- There is one administrator
- Transport cost is zero
- The network is homogeneous
Data Consistency is the Gordian Knot of Distributed Systems
The CAP theorem states that no distributed data system can provide all three, Consistency, Availability, or Partition Tolerance.
We must pick two.
(C)onsistency, (A)vailability, (P)artition Tolerance
The PACELC theorem builds on the CAP theorem, stating an additional trade-off exists between Latency and Consistency.
This argues that any time you choose consistency, you are losing some levels of Performance.
(P)artitioning, (A)vailabilty, (C)onsistency, (E)lse, (L)atency, and (C)onsistency
Real World Experience
How does the real world compare with the CAP and PACELC theorems?
The theorems are right, but many people think of them as cut-and-dry decisions Consistency vs. Low-Latency.
In reality, these trade-offs happen in levels, you can't have 100% consistency, 100% low-latency, 100% availability, & 100% partition tolerance. But you can have 100% consistency, 80% low-latency, 10% availability, & 30% partition tolerance.
Breaking It Down
Exploring the design decisions of Open Source Datastores
Redis, by default, is optimized for providing consistency and low latency but sacrifices availability and partition tolerance.
Redis has a single Primary Node for a given Key. Trading Partition Tolerance for Consistency.
Internally Redis provides Atomic operations by serializing and executing all commands sequentially. This in turn reduces performance by making command execution single-threaded.
Redis counteracts the costs of sequential execution by working exclusively in-memory, persisting to disk via frequent memory snapshots.
Replication is, by default, asynchronous but can be made synchronous with the WAIT command. This, however, causes each SET request to take longer.
Clustering reduces the impact of network partitioning on Redis by sharding keys across multiple primaries, but a single key can only have one primary.
By default MySQL like most RDMS, focuses more on Consistency and Availability through persistence. Hence, sacrificing Low-Latency and Partition Tolerance.
Like Redis, MySQL is designed to having a single active Primary for a given record.
Unlike Redis, writes are persisted on disk before being made available to users. With memory providing only query caching. Making MySQL effectively slower than Redis but more resilient to node failures.
MySQL also uses asynchronous replication by default but can support semi-synchronous replication through plugins and additional layers such as Vitess.io.
It is possible to set up MySQL with multiple Primaries using Clustered file systems. However, this also has its own limitations and risks, as a failure in a clustered file system can bring down the whole database for all records.
External mechanisms such as Vitess can also provide sharding on top of MySQL, enabling the survival of network partitions for "some keys" similar to Redis Clustering.
With Cassandra, the focus is more on the Availability of data and Partition Tolerance than Consistency. In fact, Cassandra is considered an eventually consistent database.
Cassandra doesn't promise consistency of data; it promises that all nodes will eventually agree on a consistent value.
Unlike MySQL and Redis, there is no Primary owner of data within a Cassandra cluster. Instead, data is distributed to a minimum number of replicas, enabling it to survive most network partitioning scenarios.
Cassandra also ensures data is written on disk before acknowledging the write to clients ensuring data persistence. Like MySQL, memory is used for query optimization only.
Cassandra supports synchronous replication at a query level (much like Redis WAIT); however, this type of query sacrifices write latency for data availability.
A unique value to Cassandra's approach is that it similarly handles multi-cluster replication as it handles local cluster replication.
Multi-Cluster & Availability Zones
So far, we've mainly explored the constraints of a localized system challenges of data consistency within a single data center. Adding multiple clusters and availability zones multiplies the impact of these constraints exponentially.
- Replication between clusters is often asynchronous; it can also take longer as the latency between availability zones is typically much higher than local.
- Synchronous replication between clusters can be possible depending on the database but with a high latency cost. With some prompting for near-near availability zone solutions to reduce latency.
- The chances of network partitions across availability zones are much higher than within a single availability zone.
- Write conflicts between two clusters can be a challenge; which cluster's write is the correct write?
- Even highly consistent datastores like Redis can't promise strong consistency across clusters.
The Secret to Designing Consistency in Distributed Systems
A system that looks consistent, is not
The first step in building a consistent distributed system is to accept there is no consistency in distributed systems. However, there are techniques to designing around consistency problems.
- Make it ok to use cached/old-data
- When old-cache is not good enough, you must go to a single source
- Avoid updating the same data from multiple clusters/availability zones (use Sharding)
- If you cannot avoid data conflicts, have a clear method of resolution (Databases use record timestamps)
- Replication will break, always have a path to recover
Availability Zone 1
Availability Zone 2
Availability Zone 3
- Distributed Systems are the default for new platforms, solving many issues created by centralized systems.
- Distributed Systems also bring their own challenges (latency, failures, security, etc.)
- Consistency is still a challenge for distributed systems; the CAP and PACELC theorems hold.
- We can learn from Open Source Database technologies ways to deal with data consistency.
- When designing distributed systems, know you cannot achieve true consistency.
- Use techniques such as sharding, caching, etc., to make a system feel consistent when it is actually eventually consistent.
Principal Engineer - American Express
Distributed Systems are not Consistent Systems
By Benjamin Cane