Benjamin Cane PRO
Distinguished Engineer @ American Express building payments systems. Author: https://amzn.to/3kuCFpz, Thoughts, & Opinions are my own.
* Warning, this is not a talk on distributed systems design patterns.
A distributed system is characterized as a collection of independent services/infrastructure that appears to the end-user as a single system.
Distributed Systems provides capabilities unavailable to Centralized Systems:
- Scale-out, adding more nodes vs. adding more CPU/Memory
- Reusability, common functionality can be a service offered to multiple platforms
- Availability, Centralized systems are a single point of failure
- Proximity, We can reduce latency for clients by deploying closer
- Cost-Effective, Distributed Systems rarely require expensive hardware solutions
While Distributed Systems solve many of the challenges introduced by Centralized Systems, they also have their own complexities.
Introducing the fallacies of Distributed Systems:
The CAP theorem states that no distributed data system can provide all three, Consistency, Availability, or Partition Tolerance.
We must pick two.
The PACELC theorem builds on the CAP theorem, stating an additional trade-off exists between Latency and Consistency.
This argues that any time you choose consistency, you are losing some levels of Performance.
How does the real world compare with the CAP and PACELC theorems?
The theorems are right, but many people think of them as cut-and-dry decisions Consistency vs. Low-Latency.
In reality, these trade-offs happen in levels, you can't have 100% consistency, 100% low-latency, 100% availability, & 100% partition tolerance. But you can have 100% consistency, 80% low-latency, 10% availability, & 30% partition tolerance.
Redis, by default, is optimized for providing consistency and low latency but sacrifices availability and partition tolerance.
Redis has a single Primary Node for a given Key. Trading Partition Tolerance for Consistency.
Internally Redis provides Atomic operations by serializing and executing all commands sequentially. This in turn reduces performance by making command execution single-threaded.
Redis counteracts the costs of sequential execution by working exclusively in-memory, persisting to disk via frequent memory snapshots.
Replication is, by default, asynchronous but can be made synchronous with the WAIT command. This, however, causes each SET request to take longer.
Clustering reduces the impact of network partitioning on Redis by sharding keys across multiple primaries, but a single key can only have one primary.
By default MySQL like most RDMS, focuses more on Consistency and Availability through persistence. Hence, sacrificing Low-Latency and Partition Tolerance.
Like Redis, MySQL is designed to having a single active Primary for a given record.
Unlike Redis, writes are persisted on disk before being made available to users. With memory providing only query caching. Making MySQL effectively slower than Redis but more resilient to node failures.
MySQL also uses asynchronous replication by default but can support semi-synchronous replication through plugins and additional layers such as Vitess.io.
It is possible to set up MySQL with multiple Primaries using Clustered file systems. However, this also has its own limitations and risks, as a failure in a clustered file system can bring down the whole database for all records.
External mechanisms such as Vitess can also provide sharding on top of MySQL, enabling the survival of network partitions for "some keys" similar to Redis Clustering.
With Cassandra, the focus is more on the Availability of data and Partition Tolerance than Consistency. In fact, Cassandra is considered an eventually consistent database.
Cassandra doesn't promise consistency of data; it promises that all nodes will eventually agree on a consistent value.
Unlike MySQL and Redis, there is no Primary owner of data within a Cassandra cluster. Instead, data is distributed to a minimum number of replicas, enabling it to survive most network partitioning scenarios.
Cassandra also ensures data is written on disk before acknowledging the write to clients ensuring data persistence. Like MySQL, memory is used for query optimization only.
Cassandra supports synchronous replication at a query level (much like Redis WAIT); however, this type of query sacrifices write latency for data availability.
A unique value to Cassandra's approach is that it similarly handles multi-cluster replication as it handles local cluster replication.
So far, we've mainly explored the constraints of a localized system challenges of data consistency within a single data center. Adding multiple clusters and availability zones multiplies the impact of these constraints exponentially.
The first step in building a consistent distributed system is to accept there is no consistency in distributed systems. However, there are techniques to designing around consistency problems.
Availability Zone 1
Availability Zone 2
Availability Zone 3
LinkedIn: Benjamin Cane
Principal Engineer - American Express
By Benjamin Cane
Consistency is one of the most challenging aspects of building and designing Distributed Systems. This presentation discusses why that challenge exists and explores ways open-source systems have addressed it.