Deciphering CAP: The Fundamental Trade-off in Distributed Systems

CAP Theorem is measure that can help you assess the tradeoffs when designing distributed systems. The trade-off is between Consistency, Availability and Partition Tolerance.
All of these three are inter-related, with subtle difference, and all of them these cannot be guaranteed at the same time. Before we dive any further into CAP theorem, lets deep dive these three —
Consistency — is about whether the data client sees is up to data or not. All nodes in a system with strong consistency shows most recent data or throws errors. There is no in-between.
You must note that, Consistency here is different from consistency guarantee in databases with ACID transactions.Availability — focuses more on making data available rather than data being most up to data. You might be seeing some stale data while the data is being updated. In highly available system, You are supposed to get a response, unless the node is failing.
Partition Tolerance - is ability of a distributed system to continue to operate despite an arbitrary number of messages being lost, delayed, or dropped by the network between nodes.
A partition (network partition) is a communication failure that divides the nodes of a distributed system into two or more separate groups (partition), where nodes within groups can communicate but nodes between groups cannot.
CAP theorem states it is impossible for a distributed system to simultaneously provide more than two of these three guarantees: consistency, availability, and partition tolerance.
Consistency: Every read receives the most recent write or an error.
Availability: Every request receives a (non-error) response, without guarantee that it contains the most recent write.
Partition Tolerance: The system continues to operate despite network failures (partitions) that isolate some nodes.
There leads to three theoretical system configurations —
CP (Consistency & Partition Tolerance) System - sacrifices availability to support consistency & partition tolerance.
AP (Availability & Partition Tolerance) System - sacrifices consistency to support availability and partition tolerance.
CA (Consistency & Availability) System - sacrifices partition tolerance, to support consistency & availability.
Now ideally, if network partition never occurs, and data from node n0 is automatically replicated to all other nodes, we could thus create a CA System, as both consistency and availability are achieved. However, in real-world distributed systems, network partition is unavoidable, and thus we are forced choose between consistency and availability.
If we choose Consistency, we get a CP System, on the other hand if we choose Availability we get AP system. Consistency and Availability are also interrelated, as if we reduce consistency of a system, it becomes more available, as stale data can be used, and if we choose more of availability the data client sees is not up to date.
Now when we are left with Consistency and Availablity, it is hard to say one system is better than the other as it completely depends on type of application you are building. A banking/financial app would require strong consistency since it's better to make the customer wait, than to show incorrect balance. Similarly, for a social media feed, it makes much more sense to show something even if it not the latest, rather than throwing errors at all the places.





