Zero downtime state machines are a painful, and costly, requirement to meet in distributed systems. This article outlines techniques to implement zero downtime state machines, with a particular focus on state machines running in a sequencer based architecture.
Why Zero Downtime State Machines?
Zero downtime state machines are a requirement for distributed systems that are required to be available 24x7. Examples include crypto exchanges, lending platforms, payment platforms, and other DeFi applications where clients expect to be able to submit transactions at any time.
What's the Problem?
The challenges are in three areas:
- State Machine Logic Changes: State machines are required to evolve over time. How do you ensure that the state machine logic changes are compatible with the existing state?
- State Machine Replication: In a distributed system, state machines need to be replicated to multiple machines for availability. What happens when different replicas have different versions of the state machine logic?
- State Machine State Persistence: In a distributed system, state machines need to persist their state to durable storage, such as databases, to ensure that they can recover from failures. What happens when a newer version of a state machine needs to read the stored state?
Techniques
WIP