Isolating business logic from I/O operations is essential in a low-latency application environment. This allows us to deploy techniques such as natural batching, which provides zero-wait batching of work. This, in turn, will enable us to improve application throughput.
The following diagram illustrates this approach when we have an Aeron Cluster connected process with isolated business logic:
The way it works is as follows:
- The business logic is implemented as an Agrona Agent, taking work from one or more ingress RingBuffers and
performing it on the duty cycle (i.e. within the
doWork
method). - The business logic will publish outbound messages to an egress RingBuffer as needed. If responding to an inbound ingress message, the outbound message will be published to the egress RingBuffer specific to that Agent.
- If there is any additional I/O needed—for example to read and write messages from a FIX engine or a web socket server—this work should be performed in a separate agent. This separate agent will take work from it's own ingress RingBuffer and perform the I/O operations on it's own duty cycle.
Constraints and considerations
- Each Agent runs in its own dedicated thread. This means that we need to be mindful of the number of threads we create, and ensure that we do not exceed the available CPU cycles. Be mindful of the Idle Strategies deployed.
- Each Agent's ingress and egress RingBuffers are isolated from each other. When sending messages across a RingBuffer, an efficient binary mechanism should be used, such as Simple Binary Encoding or similar.
- All communication across Agents MUST be performed via the RingBuffer. There can be no direct memory access or any concurrent data structures shared between Agents. This ensures that each Agent can progress independently, and removes a class of concurrency bugs.
- Consider using Aeron Counters to capture observability data. With the ring buffers, you can keep track of the byte position difference between the producer and consumer. This in turn can be used to understand the backlog of bytes (which can be approximated to the amount of pending work) for each Agent, and the percentage of the RingBuffer that is occupied. See also Failure Detection in Agrona Streaming Applications.
- For the lowest latency, pin each Agent to a dedicated CPU core and use the right idle strategy for your use case.
- When using Aeron Cluster clients, it is generally recommended to take this approach so that slow application logic does not impact the health of the Aeron Cluster connection. Within the Aeron Cluster client agent, keep alives can be performed on the duty cycle without impacting the processing of business logic.