Observing the state of Aeron based applications can be achieved without adding latency by using a dedicated external process.
Aeron captures a large amount of data about the state of Aeron and the Media Driver. This data can be used to observe the state of the Aeron based application, including the number of messages sent and received, the number of errors, the back pressure event count, etc.
Additionally, applications that use Aeron can also add their own custom counters to the cnc.dat
file. These counters can be used to track the state of
the application itself, such as the number of business messages processed, the number of application errors, application liveness, etc. Counters stored within the cnc.dat
file do not need to be directly related to Aeron.
Constraints and considerations around using this pattern
For this to work, the application process and the observability process must run on the same
machine (or if you're running Kubernetes, it must be the same pod on the same Kubernetes cluster node).
This is requirement because Aeron uses shared memory to store the data within the cnc.dat
file.
If your server is at or near capacity, you may not have enough CPU cycles left to run the observability process without adding latency to your application.
While reading the data from the cnc.dat
file is relatively cheap, the observability process will need to parse the data and convert it into a format suitable for
the observability system, and provide a REST or other API to access the data.
Counter space is a finite resource. By default, Aeron allocates 8192 slots for counters in the cnc.dat
file.
If your application adds counters, and you do not have enough counter space left, Aeron will not be able to add the new counters.
Additionally, if your application does not correctly resource manage the allocated counter space, you may experience paging and other memory related issues.
Note that Aeron supports both ephemeral and persistent counter space. Ephemeral counters are tied to the lifecycle of the Aeron client that created them.
Once the Aeron client is closed, the counters are released within a few seconds.
Persistent counters are not tied to the lifecycle of a client, and will persist until the cnc.dat
file is deleted.
Reading counter data
The counter data is read from the cnc.dat
file. The Aeron client provides a countersReader
which can be used to read the counter data.
A simple way to read this data is to use the CounterConsumer
interface:
@FunctionalInterface
public interface CounterConsumer
{
/**
* Accept the value for a counter.
*
* @param value of the counter.
* @param counterId of the counter
* @param label for the counter.
*/
void accept(long value, int counterId, String label);
}
This can be used on the observability process to read the counter data from the cnc.dat
file once the Aeron client has been connected to the Media Driver.
...
final Aeron aeron = Aeron.connect(aeronCtx);
// where this::printCounter is an instance method that
// matches the CounterConsumer interface
aeron.countersReader().forEach(this::printCounter);
...
From the printCounter
method, you can then publish the counter data in whatever format is compatible with your observability system.