Determinism is a core requirement for state machine replication to provide fault tolerance. In this post, we’ll look at a number of techniques to validate the determinism of your state machine with and without adding latency. These techniques can be used in systems built on Aeron Cluster, or in Sequencer based systems that make use of State Machine Replication.
Hashing the State
A common approach for validating state machine determinism involves comparing state hashes across replicas. The state machine maintains its data either as an in-memory object graph or, when optimizing for latency, as buffers accessed through flyweights generated by serialization libraries like Simple Binary Encoding.
The distribution and comparison of these hashes varies by implementation - in Aeron Cluster, the leader broadcasts its hash for other instances to verify against, while Sequencer-based systems have all instances compute and publish their hashes to an external watchdog process for comparison. This hash comparison reveals whether the state machine maintains deterministic behavior across all replicas.
Constraints and considerations
- Adding a hash comparison step will add some overhead to the state machine. The exact overhead will depend on the data volumes, serialization format and the hashing algorithm used. During this time, the state machine will be unable to process requests and the system will be paused.
- The hash must be computed on all data that is held in the state machine. It is not sufficient to hash only the parts of the state that are being updated. This means that every boolean, integer, string, etc must be included in the hash. This is especially complex when some state resides in buffers accessed through flyweights and additional state held in standard primatives and objects.
- The hash does not need to be cryptographically secure. A fast, non-cryptographic hash function can be used, as long as it is deterministic, and the probability of collisions is acceptably low for the given amount of data being hashed. In large systems running with buffers accessed through flyweights, a merkle tree can be used to reduce the amount of data that needs to be hashed by tracking only the changed subsets.
- In a Sequencer-based system, running on Aeron, the hash values can be emitted via Aeron Counters which in turn can be externally observed by a watchdog process. Be sure to compare the hash values as of the same sequence number.
Comparing Egress
An alternative approach to validating determinism is to compare the egress of the state machine. In this approach, the state machine is configured to publish its state to an external process. In an Aeron Cluster, followers can observe the egress of the leader (assuming that it is recorded into a dedicated egress Aeron Archive) and compare its log against its own using byte-wise comparison.
In a Sequencer-based system that does not deduplicate messages, a watchdog can observe the output of each state machine instance and watch for any discrepancies. In cases where messages are deduplicated, hash comparison can be used instead.
Constraints and considerations
- Adding an egress comparison step to Aeron Cluster will add some overhead to the overall process. The overhead can however be kept isolated to the followers, and the comparison can be done in parallel on an isolated thread with normal processing continuing uninterrupted.
What happens when processes diverge?
Typically, when processes diverge it implies that the system’s overall state is now inconsistent. The exact implications depend on the system’s design and the type of state machine and the business it supports. In critical systems, such as financial trading systems, divergence is typically not tolerated.
In the case of Aeron Cluster, the follower will detect the divergence and make itself unable to become a leader following a leader change. This means that the system is now in a degraded state, and depending on how much divergence there is, it may be necessary to manually intervene to resolve the divergence and restore normal operation following a leader change.
In a Sequencer-based system, the outcome varies based on the system's design. In some systems, with state machine instances running active/active and racing 00i.e. all the active instances are processing requests and responding, with the sequencer deduplicating requests using some simplistic mechanism such as a message sequence number, then an instance of the diveged state machine will be terminated. It doesn't matter which instance is terminated, as they will both be diverged. Racing active/active configurations need to be carefully considered before being deployed as non-deterministic behavior will result in diverged state machines which cannot be recovered from.
In other Sequencer-based systems with active/passive configurations, the behavior is typically in line with Aeron Cluster. The diverged passive instance will not take over from the active instance. If possible, the passive instance will be able to transfer its state from a recent snapshot of the previous active instance and recover into a deterministic state. If this is not possible, the diverged passive instance will not become active following an active instance becoming unavailable. In this case, a human intervention will be required to resolve the divergence and restore normal operation.