When running multiple distributed services that need to snapshot their state, the tried and tested Chandy-Lamport snapshot algorithm remains the standard. This team claim to have a new protocol that extends the capability of distributed system snapshots to cloud services with external interactions, allowing partial snapshots and yet retaining causal consistency.
This work presents Beaver, the first ‘partial’ snapshot protocol that extends the capability of distributed snapshots to cloud services with external interactions. Beaver provides the same basic abstraction as other snapshot protocols—for any event whose effects are observed in the snapshot, all other events that ‘happened-before’ are also included.
I will add the paper to my list of papers to dig deeper into. Aleksey Charapko also has this on his distributed systems reading list for Fall 2024.