Pattern

State Machine Determinism Validation

November 11, 2024 - State Machine Replication - just planted
5 mins

Determinism is a core requirement for state machine replication to provide fault tolerance. In this post, we’ll look at a number of techniques to validate the determinism of your state machine with and without adding latency. These techniques can be used in systems built on Aeron Cluster, or in Sequencer based systems that make use of State Machine Replication.

Hashing the State

A common approach for validating state machine determinism involves comparing state hashes across replicas. The state machine maintains its data either as an in-memory object graph or, when optimizing for latency, as buffers accessed through flyweights generated by serialization libraries like Simple Binary Encoding.

The distribution and comparison of these hashes varies by implementation - in Aeron Cluster, the leader broadcasts its hash for other instances to verify against, while Sequencer-based systems have all instances compute and publish their hashes to an external watchdog process for comparison. This hash comparison reveals whether the state machine maintains deterministic behavior across all replicas.

Constraints and considerations

  • Adding a hash comparison step will add some overhead to the state machine. The exact overhead will depend on the data volumes, serialization format and the hashing algorithm used. During this time, the state machine will be unable to process requests and the system will be paused.
  • The hash must be computed on all data that is held in the state machine. It is not sufficient to hash only the parts of the state that are being updated. This means that every boolean, integer, string, etc must be included in the hash. This is especially complex when some state resides in buffers accessed through flyweights and additional state held in standard primatives and objects.
  • The hash does not need to be cryptographically secure. A fast, non-cryptographic hash function can be used, as long as it is deterministic, and the probability of collisions is acceptably low for the given amount of data being hashed. In large systems running with buffers accessed through flyweights, a merkle tree can be used to reduce the amount of data that needs to be hashed by tracking only the changed subsets.
  • In a Sequencer-based system, running on Aeron, the hash values can be emitted via a dedicated Aeron side channel or via Aeron Counters which in turn can be externally observed by a watchdog process. Be sure to compare the hash values as of the same sequence number.

Comparing Egress

An alternative approach to validating determinism is to compare the egress of the state machine. In this approach, the state machine is configured to publish its state to an external process. In an Aeron Cluster, followers can observe the egress of the leader (assuming that it is recorded into a dedicated egress Aeron Archive) and compare its log against its own using byte-wise comparison.

In a Sequencer-based system that does not deduplicate messages, a watchdog can observe the output of each state machine instance and watch for any discrepancies. In cases where messages are deduplicated, hash comparison can be used instead.

Constraints and considerations

  • Adding an egress comparison step to Aeron Cluster will add some overhead to the overall process. The overhead can however be kept isolated to the followers, and the comparison can be done in parallel on an isolated thread with normal processing continuing uninterrupted.

What happens when processes diverge?

Typically, when processes diverge it implies that the system’s overall state is now inconsistent. The exact implications depend on the system’s design and the type of state machine and the business it supports. In critical systems, such as financial trading systems, divergence is typically not tolerated.

In the case of Aeron Cluster, the follower will detect the divergence and make itself unable to become a leader following a leader change. This means that the system is now in a degraded state, and depending on how much divergence there is, it may be necessary to manually intervene to resolve the divergence and restore normal operation following a leader change.

In a sequencer-based system, the outcome depends on the system’s design and requirements. For example, some systems with non-critical state may run state machine instances in an active/active configuration, where all active instances process commands simultaneously (aka racing). In such cases, the sequencer deduplicates the state machine’s output events using a simple mechanism, such as an emitted message sequence number. If divergence occurs among the state machine instances, one of the diverged instances will be terminated. The specific instance terminated is inconsequential, as all diverged instances are effectively invalid.

For systems managing critical state and running in an active/active configuration, these divergences are particularly problematic - they require manual intervention by engineers to investigate, reconcile the differences, and restore the system to a consistent state.

In other Sequencer-based systems with active/passive configurations, the behavior is typically in line with Aeron Cluster. The diverged passive instance will not take over from the active instance. If possible, the passive instance will be able to transfer its state from a recent snapshot of the previous active instance and be re-initialized into a consistent state. If this is not possible, the diverged passive instance should not become active following an active instance becoming unavailable. In this case, a human intervention will be required to resolve the divergence and restore normal operation.


Changelog

  • November 11, 2024Initial outline

The header image was generated using Midjourney, and refined with the help of Photoshop.