CAP Theorem

Article Info

Contributed by
3 authors

Last updated on
2022-02-15 11:53:30

Improve this article

Distributed Computing
ACID Transactions
CRUD
Two-Phase Commit Protocol
Command Query Responsibility Segregation
Byzantine Generals Problem

Article Versions

5 2022-02-15 11:53:30
3361,2033 5,3361

By devbot5S

Migrating blockquotes to markdown syntax
4 2020-04-30 14:21:25
2033,1445 4,2033

By arvindpdmn

Update to acronyms.
3 2019-06-10 02:44:24
1445,1443 3,1445

By arvindpdmn

Publishing. More images would be nice.
2 2019-06-09 18:00:04
1443,1382 2,1443

By lokesh.rawat

Completing content and images.
1 2019-05-28 08:20:34
1,1382

By lokesh.rawat

Version one. Some questions added.

Chat Room

Submitting ...

You are editing an existing chat message.

A well-design cloud-based application often stores its data across multiple servers. For faster response, data is often stored closer to clients in that geography. Due to the distributed nature of this system, it's impossible to design a perfect system. The network may be unreliable or slow at times. Therefore, there are trade-offs to be made. CAP Theorem gives system designers a method to think through and evaluate the trade-offs at the design stage.

The three parts of the CAP Theorem are Consistency, Availability, and Partition Tolerance. The theorem states that it's impossible to guarantee all three in a distributed data store. We can meet any two of them but not all three.

Over the years, designers have misinterpreted the CAP Theorem. To reflect read-world scenarios, modifications to the theorem have been proposed.

Discussion

What's the definition of CAP Theorem?
A formal definition of CAP Theorem is, "It is impossible in the asynchronous network model to implement a read/write data object that guarantees the following properties: availability, atomic consistency, in all fair executions (including those in which messages are lost)".
A simplified definition states that,
In a network subject to communication failures, it is impossible for any web service to implement an atomic read/write shared memory that guarantees a response to every request.
The word "atomic" used above means that although it's a distributed system, requests are modelled as if they are executing on a single node. This gives us an easy model for consistency.
Could you explain the CAP Theorem?
When partitioned, the service becomes inconsistent though still available. Source: Whittaker 2014.
The parts of the CAP Theorem can be understood as follows:
- Consistency: When a request is made, the server returns the right response. What is "right" depends on the service. For example, reading a value from a database might mean that the most recent write to that value should be returned.
- Availability: A request always receives a response from the server. No constraint is placed on how quickly the response must be received.
- Partition Tolerance: The underlying network is not reliable and servers may get partitioned into non-communicating groups. Despite this, the service should continue to work as desired.
As an example, consider two nodes G1 and G2 that have been partitioned. A client changes a value from v0 to v1 on G1. However, the same value is not updated on G2 due to the partition. Hence, when G2 is queried it returns the old value v0. Thus, the service is available but not consistent.
Sometimes the terms safety (consistency) and liveness (availability) are used in the generalized sense. Safety means "nothing bad ever happens". Liveness means "eventually something good happens".
What's the implication of the CAP Theorem when designing distributed systems?
When CAP Theorem was proposed, the understanding was that system designers had three options:
- CA Systems: Sacrifice partition tolerance. Single-site or cluster databases using two-phase commit are examples.
- CP Systems: Sacrifice availability. If there's a partition, for consistency we make the service unavailable: return a timeout error or lock operations.
- AP Systems: Sacrifice consistency. If there's a partition, we continue accepting requests but reconcile them later (writes) or return stale values (reads).
In practice, we deal with network partitions at least some of the time. The choice is really between consistency and availability. For databases, consistency can be achieved by enabling reads after completing writes on several nodes. Availability can be achieved by replicating data across nodes. In fact, permanent partitions are rare. So the choice is temporary.
Designers don't have to give up one of the three to build a distributed system. In fact, it's possible to have all three under normal network conditions. There's trade-off only when the network is partitioned. It's also helpful to think probabilistically. We can design a CA system if probability of a partition is far less than that of other systemic failures.
Could you share real-world applications of the CAP Theorem?
Databases that follow ACID (Atomicity, Consistency, Isolation, Durability) give priority to consistency. However, NoSQL distributed databases prefer availability over consistency since availability is often part of commercial service guarantees. So caching and logging were used for eventual consistency. This leads to what we call BASE (Basically Available, Soft-state, Eventually consistent). As examples, Zookeeper prefers consistency while Amazon's Dynamo prefers availability.
Maintaining consistency over a wide area network increases latency. Therefore, Yahoo's PNUTS system is inconsistent because it maintains remote copies asynchronously. A particular user's data is partitioned locally and accessed with low latency. Facebook's prefers to update a non-partitioned master copy. User has a more local but potentially stale copy, until it gets updated.
A web browser can go offline if it loses connection to the server. The web app can fall back to on-client persistent storage. Hence, availability is preferred over consistency to sustain long partitions. Likewise, Akamai's web caching offers best effort consistency with high level of availability.
In Google, primary partition usually resides within one datacenter, where both availability and consistency can be maintained. Outside this partition, service becomes unavailable.
What are some criticisms of the CAP Theorem?
Although the Theorem doesn't specify an upper bound on response time for availability, in practice, there's exists a timeout. CAP Theorem ignores latency, which is an important consideration in practice. Timeouts are often implemented in services. During a partition, if we cancel a request, we maintain consistency but forfeit availability. In fact, latency can be seen as another word for availability.
In NoSQL distributed databases, CAP Theorem has led to the belief that eventual consistency provides better availability than strong consistency. Some believe this is an outdated notion. It's better to factor in sensitivity to network delays.
CAP Theorem suggests a binary decision. In reality, it's a continuum. There are different degrees of consistency implemented via "read your writes, monotonic reads and causal consistency".

Milestones

1985

Comparing FLP with CAP. Source: Dinh 2016.

Originally presented in 1983, researchers Fisher, Lynch and Paterson (FLP) show that distributed consensus is impossible in a fault-tolerant manner in an asynchronous system. Distributed consensus is related to the problem of atomic storage addressed by CAP Theorem.

1996

Before CAP Theorem is formalized, researchers have been working on similar ideas. One example is the paper titled Trading Consistency for Availability in Distributed Systems by two researchers at Cornell University.

2000

Eric Brewer presents the CAP Theorem at the 19th Annual ACM Symposium on Principles of Distributed Computing (PODC). It's early history can be traced to 1998 and first published in 1999. Brewer points out that distributed computing has unduly focused on computation and not on data.

2002

MIT researchers Seth Gilbert and Nancy Lynch offer a formal proof of the CAP Theorem in their paper titled Brewer's Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services. In asynchronous systems, the impossibility result is strong. In partially synchronous systems, we can achieve a practical compromise between consistency and availability.

2010

Daniel Abadi proposes the PACELC Theorem as an alternative to the CAP Theorem. When data is replicated, there's a trade-off between latency and consistency. PACELC makes this explicit: during partitions (P), trade-off is AC; else, trade-off is LC. Default versions of Dynamo, Cassandra, and Riak are PA/EL systems. VoltDB/H-Store and Megastore are PC/EC. MongoDB is PA/EC.

References

Article Stats

1207

Words

Authors

Edits

Chats

Likes

7225

Hits

Cite As

Devopedia. 2022. "CAP Theorem." Version 5, February 15. Accessed 2023-11-12. https://devopedia.org/cap-theorem

Contributed by
3 authors

Last updated on
2022-02-15 11:53:30

Improve this article

networking system distributed computing

Distributed Computing
ACID Transactions
CRUD
Two-Phase Commit Protocol
Command Query Responsibility Segregation
Byzantine Generals Problem

CAP Theorem

Discussion

Milestones

References

Further Reading

Article Stats

Cite As

See Also

CAP Theorem

Discussion

Milestones

References

Further Reading

Article Stats

Author-wise Stats for Article Edits

Cite As

See Also

Article Warnings

Login