State Machine Replication: A Brief History

The development of state machine replication

EssayDecember 12, 2020 - growing Distributed Systems, State Machine Replication

In 1975, Paul R. Johnson and Robert H. Thomas at BBN Technologies faced a challenge while developing a database for user authentication and accounting: how to create a system that was both reliable and efficient across the early ARPANET's distributed network. This led them to tackle two critical problems that would later become fundamental challenges in distributed computing.

The first problem was reliability: ensuring the system would remain available even if one copy of the database failed. The second was performance: optimizing data access for users spread across geographically distributed ARPANET nodes. In solving these challenges, they encountered an even more fundamental problem: how to maintain consistency between multiple copies of data. This "maintenance of duplicate databases" problem required careful consideration of message ordering, timing, and conflict resolution—concepts that remain central to modern distributed databases.

They documented their solution in RFC 677, proposing a distributed database system where multiple Database Management Processes (DBMPs) maintain independent copies and synchronize changes over a network. The design was elegantly minimal: each database entry consisted of a 5-tuple containing a Selector, Value, Deletion flag, Creation Timestamp, and Modification Timestamp. The system supported four basic operations: selection, assignment, creation, and deletion.

Their core innovation was using physical timestamps to establish event ordering. They showed that consistent replicated state was achievable through the use of ordered operations, which was a precursor to state machine replication. Each timestamp combined physical time with a DBMP ID, using the ID as a tiebreaker when physical times matched. When one DBMP detected modifications from another, it compared timestamps to either apply or reject the update. While this approach effectively handled basic operations and garbage collection, it had inherent limitations: clock synchronization issues could cause anomalous behaviors, such as "newer" updates appearing to be overwritten by "older" ones, and network partitions could create temporary inconsistencies.

Reviewing this work, Leslie Lamport identified a deeper problem rooted in special relativity: there is no invariant total ordering of events in space-time. Two observers can witness the same events in different orders, and the only certainty lies in causality—we know event A happened before event B only if A caused B. This insight revealed how the RFC 677 algorithm could violate causality. In correcting the algorithm, Lamport made a crucial discovery: any distributed system can be modeled as a totally ordered stream, but that ordering must respect causal relationships rather than rely on physical time.

In July 1978, Lamport published "Time Clocks and the Ordering of Events in a Distributed System". The paper introduced logical 'Lamport' clocks to determine causality between events, replacing physical clocks. These logical clocks guarantee happens-before ordering for causally linked events, though they don't enforce a single global order. By showing that distributed processes executing the same operations in any causally-consistent order would reach the same final state, Lamport generalized earlier insights about replicated systems. Multiple causally-correct orderings can exist, potentially differing from human observers' perceived sequence, yet still maintaining system correctness. Lamport's key insight—that preserving happens-before relationships suffices for correct distributed system behavior— showed how replicated systems could maintain consistency without relying on synchronized physical time, laying the foundation for modern distributed systems theory.

Lamport's subsequent paper, The Implementation of Reliable Distributed Multiprocess Systems, expanded his theory to handle both fail-stop and Byzantine failures. The work introduced state machine replication as a solution for building reliable, highly contended systems requiring strict correctness guarantees. Lamport described the system as being implemented by a state machine, which he referred to as a single-user sequential user machine with deterministic mapping between input commands and output responses, then extended this model to multiple users through a 'sequencer' component that orders commands into a single stream and distributes results.

The logical system model is shown below:

Logical System Model

As the paper investigates systems that need to respond within predefined time limits, it confronts the challenges posed by component failures, such as the user machine itself failing. To address this issue, Lamport introduces an Acceptor component. This Acceptor is designed to always accept commands unless a component fails. While this concept isn't directly related to state machine replication, it opens up important avenues of research into failure detectors and distributed consensus.

Logical System Model with Acceptor

The paper also discusses the formation of quorums using a majority of members, a concept that has become a common feature in consensus-based systems today. This approach to quorum formation helps ensure system reliability even in the face of some component failures.

Through these concepts and discussions, Lamport's paper lays crucial groundwork for understanding and implementing reliable distributed systems, particularly those requiring strong consistency and fault tolerance using state machines.

To follow up on the outcome of RFC 677: Although it didn't achieve perfect database consistency, the work of Paul R. Johnson and Robert H. Thomas introduced the Last-Writer-Wins (LWW) Register. This is now regarded as one of the earliest examples of a CRDT (Conflict-free Replicated Data Type). LWW establishes a total order of operations but sacrifices the ability to preserve concurrent updates. Despite this trade-off, LWW registers remain in use in many modern distributed systems.


References and Further Reading

  • Paul R. Johnson, Robert H. Thomas, "The Maintenance of Duplicate Databases", RFC 677, January 1975, RFC 677
  • L. Lamport, “Time, clocks, and the ordering of events in a distributed system,” Commun. ACM, vol. 21, no. 7, pp. 558–565, Jul. 1978, doi: 10.1145/359545.359563.
  • L. Lamport, “The implementation of reliable distributed multiprocess systems,” Computer Networks (1976), vol. 2, no. 2, pp. 95–114, May 1978, doi: 10.1016/0376-5075(78)90045-4.

Changelog

  • October 27, 2024Updated with some additional details about the lasting impact of RFC 677
  • October 26, 2024Rewrote the original article into a narrative style
  • December 12, 2020Initial version

The colors used in the diagrams in this post are sourced from The Narrows, which is both a section of Zion canyon, and a hike in Zion National Park, Utah, USA.