set of properties that guarantee that database transactions are processed reliably
transaction = a single logical operation on the data (including multiple changes)
transaction example: transfer of funds from one bank account to another (lower the FROM account and raise the TO account)
A for Atomicity
Atomicity requires that each transaction is "all or nothing"
Subtract value of x from A and add it to B. Substracting from A has succeeded, but the system is unable to add the x value to B.
C for Consistency
The consistency property ensures that any transaction will bring the database from one valid state to another valid state.
Consistency is a very general term which demands that the data must meet all validation rules. Assume that transaction is going to subtract x from A without adding x to B. Before the transaction, A + B = y. After the transaction, A + B = y - 10.
or imagine a simple 1:n relation with a ON UPDATE/DELETE CASCADE that doesn't work properly. We'd end up with a Comment row related to a non-existent BlogPost row.
I for Isolation
The isolation property ensures that the concurrent execution of transactions results in a system state that would be obtained if transactions were executed serially, i.e. one after the other.
T subtracts 10 from A
T adds 10 to B
T subtracts 10 from B
T adds 10 to A
If T waits for T to be completed, everything is fine. But if it doesn't, consider that T fails after subtracting and T comes inbetween...
D for Durability
Come what may, our data is safe.
Durability means that once a transaction has been committed, it will remain so, even in the event of power loss, crashes, or errors.
x is subtracted from A, everything is saved. After x has been added to B, the system returned "OK" message and stored the operation in the cache (RAM). Meanwhile, an electricity failure went out.
Eric Brewer stated it in 2000 during Symposium on Principles of Distributed Computing:
it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees:
Consistency (aCid - all DBMS nodes see the same data at the same time)
Availability (a guarantee that every request receives a response about whether it was successful or failed)
Partition Tolerance (the system continues to operate despite arbitrary message loss or failure of part of the system)
Network Distributed Systems
CAP theorem applies to distributed DBMS used by big network systems (that's why we talk about Facebook, Google, etc).
If we are considering single-machine DBMS, Partition Tolerance is irrelevant. Consistency and Availablity are easly achievable on one machine.
in other words...
Consistency means that data is the same across the cluster, so you can read or write to/from any node and get the same data.
Availability means the ability to access the cluster even if a node in the cluster goes down.
Partition Tolerance means that the cluster continues to function even if there is a "partition" (communications break) between two nodes (both nodes are up, but can't communicate).
CA - data is consistent between all nodes - as long as all nodes are online - and you can read/write from any node and be sure that the data is the same, but if you ever develop a partition between nodes, the data will be out of sync (and won't re-sync once the partition is resolved)
CP - data is consistent between all nodes, and maintains partition tolerance (preventing data desync) by becoming unavailable when a node goes down
AP - nodes remain online even if they can't communicate with each other and will resync data once the partition is resolved, but you aren't guaranteed that all nodes will have the same data (either during or after the partition)
Andrew Oliver, InfoWorld:
Part of the reason there are so many different types of NoSQL databases lies in the CAP theorem, aka Brewer’s Theorem. The CAP theorem states you can provide only two out of the following three characteristics: consistency, availability, and partition tolerance.
Different datasets and different runtime rules cause you to make different trade-offs. Different database technologies focus on different trade-offs. The complexity of the data and the scalability of the system also come into play.
Relational databases are based on relational algebra, which is more or less an outgrowth of set theory. Relationships based on set theory are effective for many datasets, but where parent-child or distance of relationships are required, set theory isn’t very effective. You may need graph theory to efficiently design a data solution.
In other words, relational databases are overkill for data that can be effectively used as key-value pairs and underkill for data that needs more context. Overkill costs you scalability; underkill costs you performance.
an alternative to ACID:
- Basically Available
- Soft state (panta rei)
Eventual consistency (given a sufficiently long period of time over which no changes are sent, all updates can be expected to propagate eventually through the system and the replicas will be consistent)
stands in contrast to other, more traditional relational databases whose data has ACID
BASE: why ACID alternative?
Soft State. BASE is diametrically opposed to ACID. Where ACID is pessimistic and forces consistency at the end of every operation, BASE is optimistic and accepts that the database consistency will be in a state of flux. Although this sounds impossible to cope with, in reality it is quite manageable and leads to levels of scalability that cannot be obtained with ACID.
Basic Availability. The availability of BASE is achieved through supporting partial failures without total system failure. Here is a simple example: if users are partitioned across five database servers, BASE design encourages crafting operations in such a way that a user database failure impacts only the 20 percent of the users on that particular host. There is no magic involved, but this does lead to higher perceived availability of the system.
BASE vs ACID by means of CAP
CAP is basically a continuum along which BASE and ACID are on opposite ends.
CAP is Consistency, Availability, and Partition tolerance. Basically you can pick 2 of those but you can't do all 3.
- ACID focuses on Consistency and Availability.
- BASE focuses on Partition Tolerance and Availability and throws consistency out the window.
CAP as ACID/BASE continuum
You can decide how close you want to be to one end of the continuum or the other according to your priorities.