Consistency of Distributed Transactions
In the previous two blogs, we have discussed the consistency in two different context. In ACID, consistency means a data validity guarantee, which highly depends on the atomicity and isolation. In CAP theorem, the consistency states that a data system can provide the same view of the data. ACID statement focuses on the data property while CAP is more about system’s property under distributed environment. In this blog, I will combine them and discuss the consistency of a distributed transitions. I will start from different facades of consistency and discuss the implementation ideas later.
Linearizability, also known as atomic consistency, is the strongest level of consistency that a distributed data system offers. It states two characteristics,
- All replicas act as there is only one copy of data.
- All operations on the data are atomic
To crystalize the definitions, let us look at a few examples from the client’s view. In the figure, each bar represents a request from the client, where the start and end of the bar shows the time when request was sent and response was received by the client. Due to various network delays, there is no clue for the client to known exactly when the request is processed. Assume there are only “read by key” and “set by key” offered by the data system.
If there are concurrent write and read as I showed in figure 1, the read operation can either return the old value (0) or the new value (1). Therefore, we can not claim such system is linearizable since there are two copies of data. In a linearizable system, we can image that there must be a time point when the value of x atomically flips from old value to new value. Thus, any subsequent read request should return new value even if the write operation has not yet completed. Figure 2 shows a linearizable system’s action for a similar scenario, where the vertical line in the bar indicates when the operation takes effect at server side.
Although I use concurrent write/read operations in the example, it is important to distinguish that the linearizability from serializability. linearizability, as a consistency level, focus on the data recency. In the example above, linearizability does not guarantee that some operations can be isolated and executed as a transaction. It only guarantees that the response of a read request does NOT flip between old and new value.
All distributed linearizable system requests some degrees of consensus algorithm. Therefore, the throughput of a linearizable system is hindered. In most use cases, it is not necessary to pay exact performance cost for linearizability. However, there are a few use cases worth special attention, since they usually request a global view across all data parittions. If the solution is for following use cases, the architecture should consider linearizability
- Locking mechanism and Leader election
- Constraints and uniqueness guarantees
Additionally, although linearizability is not the only solution to race conditions, it is the simplest to understand. Therefore, it is worth to try linearizability if there are requirements for heterogeneous channel orchestration with timing dependence.
Consistency vs. Ordering
A linearizable system behaves as every operation takes effect atomically at one point in time. The defines implies that operations are executed in a well-defined order. If there is only one copy of data, every operation can be sorted through a global order, called total order. In a distributed system, all data replicas must maintain the same order, therefore linearizability is also known as total order broadcast.
Otherwise, if system system is capable to preserve order among logical related operations, we call it partial order. Partial ordered operations are usually logical dependent. Therefore, we also call such partial order causal order. A casual order can not sort concurrent operations or operations touches different irrelevant data partitions.
Clearly, causality is a weaker consistency guarantee than linearizability. However, since it captures the logical relationship among relevant operations. It is widely adapted by a lot of distributed data system. Most data system achieve causality through Lamport timestamps, proposed by Leslie Lamport in 1978.
Another perspective of consistency is the atomic commitment. In a single host application, an atomic commitment depends on the storage engine to make the request durable, first the data and then the commit record. Therefore, the key deciding moment for whether the transaction commits or abort is the when the durable storage system writes the request.
However, it is more complicated when multiple machines involved. It is possible that a transaction is committed only on some machines. The reason can be
- Some nodes may detect a constraint violation or conflict
- Requests may be lost due to network.
- Some nodes may crash and recover to a previous state without recent commit.
To solve the issue, a two phase commit strategy is proposed. As the name indicates, there are two phase to commit a transaction
- Prepare phase. During this phase, a coordinator initial a prepare request to all participants with transaction data. Each participants check locally and aye to the prepare request if it can fulfill the request. Otherwise, it rejects the prepare request. This is the decision moment for participant to commit or abort. Once a participant vote yes to a request, it cannot reject the transaction later.
- Commit phase. After coordinator confirms with all participants, the coordinator may send the transaction to all the participants through a commit phase. This is the decision moment for coordinator to commit the transaction. Once the coordinator decide to commit, it must guarantee the commit request is acknowledged by all participants with necessary unlimited retries. Participant must reply the commit phase.
What is next?
After high level theoretical concepts in the last few posts, we will put things together and discuss an architecture view of distributed data system in the next post.