This is written version of an interactive talk given in 2024. See the related source code examples in GitHub. The target audience was intermediate Java developers who were building systems using Aeron, and who were interested in learning more about Simple Binary Encoding.
Aeron does not enforce any specific type of encoding for the data that your application code sends, but for best performance it is generally recommended to make use of efficient binary formats, such as Simple Binary Encoding.
Simple Binary Encoding has its roots in the FIX Trading Community, and the core specifications can be found online at the FIX Trading Community site. The implementation of SBE can be found at Simple Binary Encoding.
We'll start by looking at how to generate Java code from an XML schema, and then we'll look at the generated code itself.
Generating Java Code with SBE
The Simple Binary Encoding (SBE) tool generates Java code from an XML schema. The SBE tool is a standalone JAR file that can be run from the command line or integrated into a build system.
Gradle Sample
A simple, fixed-length message type is defined in the following XML schema.
Assume that it, along with some additional messages and some top level metadata,
is saved in a file named schema-01.xml
:
<sbe:messageSchema xmlns:sbe="http://fixprotocol.io/2016/sbe"
package="com.shaunlaurens.pa"
id="1000"
version="1"
semanticVersion="pa0.1"
description="Schema 1 for the PA samples, version 0.1">
<types>
<composite name="messageHeader"
description="Message identifiers and length of message root">
<type name="blockLength" primitiveType="uint16"/>
<type name="templateId" primitiveType="uint16"/>
<type name="schemaId" primitiveType="uint16"/>
<type name="version" primitiveType="uint16"/>
</composite>
</types>
<sbe:message name="MessageType1" id="1"
description="A simple, fixed length message type">
<field name="field1" id="1" type="int64"/>
<field name="field2" id="2" type="int32"/>
<field name="field3" id="3" type="int64"/>
</sbe:message>
</sbe:messageSchema>
We can then use the SBE tool to generate Java code for this schema. The following Gradle file (note, the sample uses Kotlin Script) includes the core components to generate the Java code and compile it:
plugins {
`java-library`
}
@Suppress("DEPRECATION")
val generatedDir = file("${buildDir}/generated/src/main/java") //1(1)
val codecGeneration = configurations.create("codecGeneration") //2(2)
dependencies {
"codecGeneration"(libs.sbe) //3(3)
implementation(libs.agrona)
testImplementation(libs.bundles.testing)
}
sourceSets {
main {
java.srcDir(generatedDir) //4(4)
}
}
tasks {
task("generateCodecs", JavaExec::class) {
group = "sbe"
val codecsFile = "src/main/resources/schema-01.xml" //5(5)
val xsdFile = "src/main/resources/fpl/sbe.xsd" //6(6)
mainClass.set("uk.co.real_logic.sbe.SbeTool")
args = listOf(codecsFile)
inputs.files(codecsFile, xsdFile)
outputs.dir(generatedDir) //7(7)
classpath = codecGeneration
systemProperties["sbe.output.dir"] = generatedDir //8 (8)
systemProperties["sbe.validation.xsd"] = xsdFile
systemProperties["sbe.validation.stop.on.error"] = "true"
systemProperties["sbe.target.language"] = "Java"
}
compileJava {
dependsOn("generateCodecs")
}
}
- This is the directory to which the
generated code
is written to. - This creates a code configuration in Gradle for our generated code
- Our codec generation task will depend on the SBE library
- We must tell gradle in to include the generated code in the source set
- The location of the schema file
- The location of the SBE XSD file, as used for validation
- The generated code is written to the
generatedDir
; this informs the Gradle task of the output directory - The SBE tool is additionally configured to output Java code to
generatedDir
Investigating Fixed Length Messages
Fixed length messages are the simplest form of message type in SBE. They are defined by a fixed number of bytes and are typically used for messages that need the highest level of performance.
We will continue to use the simple fixed-length message type we defined when generating the java code before.
This message is defined in the following XML schema, schema-01.xml
:
<sbe:messageSchema xmlns:sbe="http://fixprotocol.io/2016/sbe"
package="com.shaunlaurens.pa"
id="1000"
version="1"
semanticVersion="pa0.1"
description="Schema 1 for the PA samples, version 0.1">
<types>
<composite name="messageHeader"
description="Message identifiers and length of message root">
<type name="blockLength" primitiveType="uint16"/>
<type name="templateId" primitiveType="uint16"/>
<type name="schemaId" primitiveType="uint16"/>
<type name="version" primitiveType="uint16"/>
</composite>
</types>
<sbe:message name="MessageType1" id="1"
description="A simple, fixed length message type">
<field name="field1" id="1" type="int64"/>
<field name="field2" id="2" type="int32"/>
<field name="field3" id="3" type="int64"/>
</sbe:message>
</sbe:messageSchema>
When run through the SBE tool, this schema file will result in the following Java code being generated:
└── src
└── main
└── java
└── com
└── shaunlaurens
└── pa
├── MessageType1Decoder.java
├── MessageType1Encoder.java
├── MessageHeaderDecoder.java
├── MessageHeaderEncoder.java
├── MetaAttribute.java
└── package-info.java
Let's investigate the generated code.
Message Type 1 Encoder and Decoder
Static header information
Both the encoder and decoder classes for the message type include fixed attributes for the header data we defined in the schema.
public static final int BLOCK_LENGTH = 20;
public static final int TEMPLATE_ID = 1;
public static final int SCHEMA_ID = 1000;
public static final int SCHEMA_VERSION = 1;
public static final String SEMANTIC_VERSION = "pa0.1";
We can see:
BLOCK_LENGTH
is the total length of the message type in bytes.
This only includes the message fields and does not include any header information. In our case, the message type is 20 bytes long with 8 bytes for field1, 4 bytes for field2, and 8 bytes for field3.TEMPLATE_ID
is the unique identifier for the message type, as we defined in the schema.SCHEMA_ID
is the unique identifier for the schema, again, with the value set to what was provided in the schema.SCHEMA_VERSION
is the version of the schema.SEMANTIC_VERSION
is a human-readable version of the schema that we defined in the schema.
Field encoding and decoding
Message Type 1 is a simple, fixed-length message type. It includes three fields: field1
, field2
, and field3
.
Field 1 is an int64
, field 2 is an int32
, and field 3 is an int64
.
Visually, the message type content (excluding any header information), looks like this with each block representing a byte:
Within the generated Java code, we can see that the MessageType1Encoder
and MessageType1Decoder
classes
use fixed byte offsets for reading and writing the data.
The offsets are calculated based on the field's position in the message type along with the previous field's lengths.
...
public long field1()
{
return buffer.getLong(offset + 0, BYTE_ORDER);
}
public int field2()
{
return buffer.getInt(offset + 8, BYTE_ORDER);
}
public long field3()
{
return buffer.getLong(offset + 12, BYTE_ORDER);
}
...
In much the same way, we can see the writing also uses fixed byte offsets:
public MessageType1Encoder field1(final long value)
{
buffer.putLong(offset + 0, value, BYTE_ORDER);
return this;
}
public MessageType1Encoder field2(final int value)
{
buffer.putInt(offset + 8, value, BYTE_ORDER);
return this;
}
public MessageType1Encoder field3(final long value)
{
buffer.putLong(offset + 12, value, BYTE_ORDER);
return this;
}
Some key points to note about this fixed-length message type:
- None of the header information is included in the encoder and decoder data access. The header information is hard-coded into the encoder and decoder classes.
We can make use of the
MessageHeaderEncoder
andMessageHeaderDecoder
classes to encode and decode the header information separately. - The offsets are fixed and calculated based on the field's position in the message type. There is nothing within the message type itself that indicates the length of the fields, so if a specific encoder is placed over a buffer containing the same number of bytes, but a different message type, then the Decoder will still go ahead and read it. This starts to get interesting as we move to evolving schemas.
- The generated code is efficient and performs well, but it is also low-level and requires careful handling to ensure correctness. In this scenario, the offsets are fixed and known at compile time, so there is no need to calculate them at runtime. This can lead to very fast encoding and decoding of messages. We are also able to read and write using these decoders in any order. This is not always the case with SBE.
Message Header Decoder and Encoder
SBE tool generates a MessageHeaderEncoder
and MessageHeaderDecoder
for each schema that includes the messageHeader
composite type.
These classes can be used to read and write data on the buffer, independently to the payload data.
In much the same as as the MessageType1 encoder and decoder, the header encoder and decoder classes use fixed byte offsets for reading and writing the data, and hard codes the header data we supplied earlier.
<composite name="messageHeader"
description="Message identifiers and length of message root">
<type name="blockLength" primitiveType="uint16"/>
<type name="templateId" primitiveType="uint16"/>
<type name="schemaId" primitiveType="uint16"/>
<type name="version" primitiveType="uint16"/>
</composite>
This is reflected in the generated Java code as follows:
public int blockLength()
{
return (buffer.getShort(offset + 0, BYTE_ORDER) & 0xFFFF);
}
public int templateId()
{
return (buffer.getShort(offset + 2, BYTE_ORDER) & 0xFFFF);
}
public int schemaId()
{
return (buffer.getShort(offset + 4, BYTE_ORDER) & 0xFFFF);
}
public int version()
{
return (buffer.getShort(offset + 6, BYTE_ORDER) & 0xFFFF);
}
The usage of the & 0xFFFF
is to ensure that the value is treated as an unsigned short (as requested in the schema with uint16
), as Java does not have unsigned types.
public MessageHeaderEncoder blockLength(final int value)
{
buffer.putShort(offset + 0, (short)value, BYTE_ORDER);
return this;
}
public MessageHeaderEncoder templateId(final int value)
{
buffer.putShort(offset + 2, (short)value, BYTE_ORDER);
return this;
}
public MessageHeaderEncoder schemaId(final int value)
{
buffer.putShort(offset + 4, (short)value, BYTE_ORDER);
return this;
}
public MessageHeaderEncoder version(final int value)
{
buffer.putShort(offset + 6, (short)value, BYTE_ORDER);
return this;
}
We typically do not interact directly with the MessageHeaderEncoder - we can simply use wrapAndApplyHeader
within the MessageType1Encoder
to apply the header to the buffer.
public MessageType1Encoder wrapAndApplyHeader(
final MutableDirectBuffer buffer, final int offset,
final MessageHeaderEncoder headerEncoder)
{
headerEncoder
.wrap(buffer, offset)
.blockLength(BLOCK_LENGTH)
.templateId(TEMPLATE_ID)
.schemaId(SCHEMA_ID)
.version(SCHEMA_VERSION);
return wrap(buffer, offset + MessageHeaderEncoder.ENCODED_LENGTH);
}
Safely consuming messages from buffers
Unless you're using SBE in a very controlled environment (for example, when you're moving data across an Agrona ringbuffer within an application), you should always make use of the headers. This will allow the decoder to correctly identify the template id, and allow the application to validate the schema id and schema version before attempting to decode the payload.
private static final UnsafeBuffer BUFFER =
new UnsafeBuffer(ByteBuffer.allocate(256));
private static final MessageHeaderEncoder MESSAGE_HEADER_ENCODER =
new MessageHeaderEncoder();
private static final MessageHeaderDecoder MESSAGE_HEADER_DECODER =
new MessageHeaderDecoder();
private static final MessageType1Encoder MESSAGE_TYPE1_ENCODER =
new MessageType1Encoder();
private static final MessageType1Decoder MESSAGE_TYPE1_DECODER =
new MessageType1Decoder();
@Test
public void encodeDecode()
{
final int bufferOffset = 0;
MESSAGE_TYPE1_ENCODER.wrapAndApplyHeader(BUFFER, bufferOffset,
MESSAGE_HEADER_ENCODER) //1(1)
.field1(1234L)
.field2(4321)
.field3(6789L); //2(2)
MESSAGE_HEADER_DECODER.wrap(BUFFER, bufferOffset); //3(3)
assertEquals(1, MESSAGE_HEADER_DECODER.templateId());
assertEquals(20, MESSAGE_HEADER_DECODER.blockLength());
assertEquals(1000, MESSAGE_HEADER_DECODER.schemaId());
assertEquals(1, MESSAGE_TYPE1_DECODER.sbeSchemaVersion());
MESSAGE_TYPE1_DECODER.wrapAndApplyHeader(BUFFER, 0,
MESSAGE_HEADER_DECODER); //4(4)
final long field1 = MESSAGE_TYPE1_DECODER.field1();
final int field2 = MESSAGE_TYPE1_DECODER.field2();
final long field3 = MESSAGE_TYPE1_DECODER.field3();
assertEquals(1234L, field1);
assertEquals(4321, field2);
assertEquals(6789L, field3);
}
- We wrap and apply the header to have the header data automatically written per the hard coded field values.
- When writing with SBE, a best practice is to write the fields in the exact order defined in the schema. Fields not written in order for a non-fixed length message type can lead to incorrect encoding and decoding.
- The decoder wraps the buffer and reads the header data. This is a critical step to ensure the correct schema is used for decoding the payload.
The
MessageType1Decoder
will raise an exception if the template id does not match the expected template id. Note that it does not check the schema id or version - this is up to the application. Note that this step is not necessary if you have certainty that the buffer contains only one type of message. - Now that we are certain of the template id, we can wrap and apply the header to the decoder and read the fields in the correct order.
Investigating Variable Length Messages
Fixed-length messages are the simplest form of message type in SBE. However, there are many cases where variable-length data fields need to be sent. To address this, we will create a new schema that includes a message type with two variable-length fields. We will then investigate how the generated Java code handles these fields.
This message is defined in the following XML schema, schema-02.xml
:
<sbe:messageSchema xmlns:sbe="http://fixprotocol.io/2016/sbe"
package="com.shaunlaurens.pa.schema2"
id="1001"
version="1"
semanticVersion="pa0.1"
description="Schema 2 for the PA samples, version 0.1">
<types>
<composite name="messageHeader"
description="Message identifiers and length of message root">
<type name="blockLength" primitiveType="uint16"/>
<type name="templateId" primitiveType="uint16"/>
<type name="schemaId" primitiveType="uint16"/>
<type name="version" primitiveType="uint16"/>
</composite>
<composite name="varStringEncoding"> <!-- 1(1) -->
<type name="length" primitiveType="uint32" maxValue="1073741824"/>
<type name="varData" primitiveType="uint8" length="0"
characterEncoding="UTF-8"/>
</composite>
</types>
<sbe:message name="MessageType2" id="1"
description="A message with two var length fields">
<field name="field1" id="1" type="int64"/>
<data name="field2" id="2" type="varStringEncoding"/> <!-- 2(2) -->
<data name="field3" id="3" type="varStringEncoding"/> <!-- 3(3) -->
</sbe:message>
</sbe:messageSchema>
- We have added a new composite type,
varStringEncoding
, which includes alength
field and avarData
field. Thelength
field is auint32
that defines the length of thevarData
field. ThevarData
field is auint8
that is variable length and uses UTF-8 character encoding to store strings. - We have added a new field,
field2
, to theMessageType2
message type. This field uses thevarStringEncoding
composite type we defined. - We have added a new field,
field3
, to theMessageType2
message type. This field also uses thevarStringEncoding
composite type we defined.
When run through the SBE tool, this schema file will result in the following Java code being generated:
└── src
└── main
└── java
└── com
└── shaunlaurens
└── pa
└── schema2
├── MessageHeaderDecoder.java
├── MessageHeaderEncoder.java
├── MessageType2Decoder.java
├── MessageType2Encoder.java
├── MetaAttribute.java
├── VarStringEncodingDecoder.java
├── VarStringEncodingEncoder.java
└── package-info.java
We will focus on the generated Message Type 2 encoder and decoder code. The VarStringEncodingEncoder and VarStringEncodingDecoder classes are also generated, but are unused. The other classes are much the same as before.
Message Type 2 Encoder and Decoder
Static header information
Both the encoder and decoder classes for the message type include fixed attributes for the header data we defined
in the schema. Of particular interest is the BLOCK_LENGTH
, which is now only the length of the fixed length field field1
.
public static final int BLOCK_LENGTH = 8;
public static final int TEMPLATE_ID = 1;
public static final int SCHEMA_ID = 1001;
public static final int SCHEMA_VERSION = 1;
public static final String SEMANTIC_VERSION = "pa0.1";
We can see:
BLOCK_LENGTH
is actually the total length of the fixed length message fields in bytes.
In our case, the message type has 8 bytes for field1. Other fields are not counted.TEMPLATE_ID
is the unique identifier for the message type, as we defined in the schema.SCHEMA_ID
is the unique identifier for the schema, again, with the value set to what was provided in the schema.SCHEMA_VERSION
is the version of the schema.SEMANTIC_VERSION
is a human-readable version of the schema that we defined in the schema.
Variable-length field decoding
Message Type 2 adds complexity with two variable length string fields. Visually, one might expect that the buffer content (excluding any header information), looks like this with each block representing a byte:
We will correct this diagram later.
Within the generated Java code, we can see that the MessageType2Encoder
and MessageType2Decoder
classes
use fixed byte offsets for reading and writing the fixed data. This is the same as messsage type 1.
We will focus on the variable length fields in the MessageType2Encoder
and MessageType2Decoder
classes.
public String field2()
{
final int headerLength = 4;
final int limit = parentMessage.limit();
final int dataLength = (int)(buffer.getInt(limit, BYTE_ORDER)
& 0xFFFF_FFFFL);
parentMessage.limit(limit + headerLength + dataLength);
if (0 == dataLength)
{
return "";
}
final byte[] tmp = new byte[dataLength];
buffer.getBytes(limit + headerLength, tmp, 0, dataLength);
return new String(tmp, java.nio.charset.StandardCharsets.UTF_8);
}
...
public String field3()
{
final int headerLength = 4;
final int limit = parentMessage.limit();
final int dataLength = (int)(buffer.getInt(limit, BYTE_ORDER)
& 0xFFFF_FFFFL);
parentMessage.limit(limit + headerLength + dataLength);
if (0 == dataLength)
{
return "";
}
final byte[] tmp = new byte[dataLength];
buffer.getBytes(limit + headerLength, tmp, 0, dataLength);
return new String(tmp, java.nio.charset.StandardCharsets.UTF_8);
}
Some things of interest in these two methods:
- There are no more fixed offsets for reading the data. Now, there is internal state that is used to track the position in the buffer, called
limit
. - we can see that it is first reading the length of the data from the buffer, then reading the data itself.
- the
parentMessage
is used to limit the buffer to the correct length for reading the data - excluding the state held within the
limit
, the reads offield1
andfield2
are identical - so how could it know how to read them unless read in the order written? - given the identical nature of the reads, it seems safe to assume that out of order reads will result in invalid data.
Out of order reads after correct order writes
Let's try a quick experiment to see what happens when we read the fields out of order:
private static final UnsafeBuffer BUFFER =
new UnsafeBuffer(ByteBuffer.allocate(256));
private static final MessageType2Decoder MESSAGE_TYPE_2_DECODER =
new MessageType2Decoder();
private static final MessageType2Encoder MESSAGE_TYPE_2_ENCODER =
new MessageType2Encoder();
private static final MessageHeaderDecoder MESSAGE_HEADER_DECODER =
new MessageHeaderDecoder();
private static final MessageHeaderEncoder MESSAGE_HEADER_ENCODER =
new MessageHeaderEncoder();
@Test
public void testEncodingDecodingWrongOrderRead()
{
final int bufferOffset = 0;
MESSAGE_TYPE_2_ENCODER.wrapAndApplyHeader(BUFFER, bufferOffset,
MESSAGE_HEADER_ENCODER)
.field1(1234L)
.field2("this is string field 2") // 1(1)
.field3("this is string field 3"); // 2(2)
MESSAGE_TYPE_2_DECODER.wrapAndApplyHeader(BUFFER, bufferOffset,
MESSAGE_HEADER_DECODER);
final long field1 = MESSAGE_TYPE_2_DECODER.field1();
final String field3 = MESSAGE_TYPE_2_DECODER.field3(); // 3(3)
final String field2 = MESSAGE_TYPE_2_DECODER.field2(); // 4(4)
assertEquals(1234L, field1);
assertEquals("this is string field 3", field2); //INVALID 5(5)
assertEquals("this is string field 2", field3); //INVALID 6(6)
}
- We write the fields in the correct order. String 'this is string field 2' goes to field 2
- We write the fields in the correct order. String 'this is string field 3' goes to field 3
- We read field 3 before field 2, causing the limit to be set to the wrong value
- We read field 2 after field 3, causing the decoder to read the wrong value
- Field 2 is read as field 3's value, so the test passes
- Field 3 is read as field 2's value, so the test passes
The above test passes - despite the asserts marked 5 and 6 being incorrect. This is because the limit is set to the wrong value by the read in 3, and the decoder continues to read the wrong data.
Variable-length field encoding
public MessageType2Encoder field2(final String value)
{
final byte[] bytes = (null == value || value.isEmpty()) ?
org.agrona.collections.ArrayUtil.EMPTY_BYTE_ARRAY :
value.getBytes(java.nio.charset.StandardCharsets.UTF_8);
final int length = bytes.length;
if (length > 1073741824)
{
throw new IllegalStateException("length > maxValue for type: " + length);
}
final int headerLength = 4;
final int limit = parentMessage.limit();
parentMessage.limit(limit + headerLength + length);
buffer.putInt(limit, length, BYTE_ORDER);
buffer.putBytes(limit + headerLength, bytes, 0, length);
return this;
}
...
public MessageType2Encoder field3(final String value)
{
final byte[] bytes = (null == value || value.isEmpty()) ?
org.agrona.collections.ArrayUtil.EMPTY_BYTE_ARRAY :
value.getBytes(java.nio.charset.StandardCharsets.UTF_8);
final int length = bytes.length;
if (length > 1073741824)
{
throw new IllegalStateException("length > maxValue for type: " + length);
}
final int headerLength = 4;
final int limit = parentMessage.limit();
parentMessage.limit(limit + headerLength + length);
buffer.putInt(limit, length, BYTE_ORDER);
buffer.putBytes(limit + headerLength, bytes, 0, length);
return this;
}
In much the same way as the decoder, the encoder uses the parentMessage
to limit the buffer to the correct length for writing the data.
The writing of the data length followed by the data itself is the same for both fields.
Out of order writes, correct order reads
public void testEncodingDecodingWrongOrderWrite()
{
final int bufferOffset = 0;
MESSAGE_TYPE_2_ENCODER.wrapAndApplyHeader(BUFFER, bufferOffset,
MESSAGE_HEADER_ENCODER)
.field1(1234L)
.field3("this is field three") // WRONG ORDER (1)
.field2("this is field two"); // WRONG ORDER (2)
MESSAGE_TYPE_2_DECODER.wrapAndApplyHeader(BUFFER, bufferOffset,
MESSAGE_HEADER_DECODER);
final long field1 = MESSAGE_TYPE_2_DECODER.field1();
final String field2 = MESSAGE_TYPE_2_DECODER.field2(); // CORRECT ORDER (3)
final String field3 = MESSAGE_TYPE_2_DECODER.field3(); // CORRECT ORDER (4)
assertEquals(1234L, field1);
assertEquals("this is field three", field2); // INVALID 5 (5)
assertEquals("this is field two", field3); // INVALID 6 (6)
}
- We write the fields in the wrong order. String 'this is field three' goes to field 3. This sets the limit to the wrong value
- We continue to write the fields in the incorrect order. String 'this is field two' goes to field 2 after we wrote field 3
- We read field 2 before field 3, and the decoder's limit is correctly set
- We read field 3 after field 2
- Field 2 is incorrectly read as field 3's value, so the test passes
- Field 3 is incorrectly read as field 2's value, so the test passes
The above test passes - despite the asserts marked 5 and 6 being incorrect. This is because the limit is set to the wrong value by the first write. The decoder behaves correctly - it is the buffer data that is invalid.
Out of order reads and writes
In case you were wondering if you happen to both write and read the fields in the wrong order, the test will pass with the correct data in the correct fields. This is despite the fact that the buffer data has the fields in the incorrect order.
@Test
public void testEncodingDecodingWrongOrderWriteAndRead()
{
final int bufferOffset = 0;
MESSAGE_TYPE_2_ENCODER.wrapAndApplyHeader(BUFFER, bufferOffset,
MESSAGE_HEADER_ENCODER)
.field1(1234L)
.field3("this is field three") // WRONG ORDER
.field2("this is field two"); // WRONG ORDER
MESSAGE_TYPE_2_DECODER.wrapAndApplyHeader(BUFFER, bufferOffset,
MESSAGE_HEADER_DECODER);
final long field1 = MESSAGE_TYPE_2_DECODER.field1();
final String field3 = MESSAGE_TYPE_2_DECODER.field3(); // WRONG ORDER
final String field2 = MESSAGE_TYPE_2_DECODER.field2(); // WRONG ORDER
assertEquals(1234L, field1);
assertEquals("this is field two", field2); // CORRECT DATA
assertEquals("this is field three", field3); // CORRECT DATA
}
The above test passes, though the buffer data is not in the expected order. The corrected buffer diagram for this particular message type is thus impacted by the order of writes:
Repeating Groups with Variable Length Fields
There are cases in which a variable length data field is needed within a group within a message. For example, if we are transmitting instrument information, we may have a group of instruments, each with a variable length instruments name.
We will create another schema, schema-03.xml
, to demonstrate this.
<sbe:messageSchema xmlns:sbe="http://fixprotocol.io/2016/sbe"
package="com.shaunlaurens.pa.schema3"
id="1002"
version="1"
semanticVersion="pa0.1"
description="Schema 1 for the PA samples, version 0.1">
<types>
<composite name="messageHeader"
description="Message identifiers and length of message root">
<type name="blockLength" primitiveType="uint16"/>
<type name="templateId" primitiveType="uint16"/>
<type name="schemaId" primitiveType="uint16"/>
<type name="version" primitiveType="uint16"/>
</composite>
<composite name="varStringEncoding">
<type name="length" primitiveType="uint32" maxValue="1073741824"/>
<type name="varData" primitiveType="uint8" length="0"
characterEncoding="UTF-8"/>
</composite>
<composite name="groupSizeEncoding"
description="Repeating group dimensions."> <!-- 1 (1) -->
<type name="blockLength" primitiveType="uint16"/>
<type name="numInGroup" primitiveType="uint16"/>
</composite>
</types>
<sbe:message name="MessageType3" id="3"
description="Message Type with a repeating group">
<field name="field1" id="1" type="int64"/>
<group name="group1" id="10"
dimensionType="groupSizeEncoding"> <!-- 2 (2) -->
<field name="groupField1" id="11" type="int64"/>
<data name="groupField2" id="12" type="varStringEncoding"/>
<data name="groupField3" id="13" type="varStringEncoding"/>
</group>
<data name="field2" id="2" type="varStringEncoding"/>
</sbe:message>
</sbe:messageSchema>
- The group size encoding composite type is used to provide the block length and number of groups in the repeating group. This is a common pattern for repeating groups in SBE messages.
- We add a repeating group after the fixed length fields, but before the variable length fields.
We are again going to focus on the MessageType3 encoder and decoder generated by the SBE tool.
MessageType3 Encoder
public void wrap(final MutableDirectBuffer buffer, final int count)
{
if (count < 0 || count > 65534)
{
throw new
IllegalArgumentException("count outside allowed range: count="
+ count);
}
if (buffer != this.buffer)
{
this.buffer = buffer;
}
index = 0;
this.count = count;
final int limit = parentMessage.limit();
initialLimit = limit;
parentMessage.limit(limit + HEADER_SIZE);
buffer.putShort(limit + 0, (short)8, BYTE_ORDER); // block length (1)
buffer.putShort(limit + 2, (short)count, BYTE_ORDER); // numInGroup (2)
}
- The block length is set to 8 bytes for the repeating group. This is the sum of the length of the fixed length fields in the group.
- The number of items in the repeating group is set.
The wrap
method is used to set up the buffer for encoding the repeating group.
The wrap
method is called by the parent message encoder to set up the buffer for the repeating group.
The blockLength
and numInGroup
fields are defined in the groupSizeEncoding
composite type in the schema.
Encoding the Repeating Group
public Group1Encoder groupField1(final long value)
{
buffer.putLong(offset + 0, value, BYTE_ORDER);
return this;
}
...
public Group1Encoder groupField2(final String value)
{
final byte[] bytes = (null == value || value.isEmpty()) ?
org.agrona.collections.ArrayUtil.EMPTY_BYTE_ARRAY :
value.getBytes(java.nio.charset.StandardCharsets.UTF_8);
final int length = bytes.length;
if (length > 1073741824)
{
throw new IllegalStateException("length > maxValue for type: "
+ length);
}
final int headerLength = 4;
final int limit = parentMessage.limit();
parentMessage.limit(limit + headerLength + length);
buffer.putInt(limit, length, BYTE_ORDER);
buffer.putBytes(limit + headerLength, bytes, 0, length);
return this;
}
...
public Group1Encoder groupField3(final String value)
{
final byte[] bytes = (null == value || value.isEmpty()) ?
org.agrona.collections.ArrayUtil.EMPTY_BYTE_ARRAY :
value.getBytes(java.nio.charset.StandardCharsets.UTF_8);
final int length = bytes.length;
if (length > 1073741824)
{
throw new IllegalStateException("length > maxValue for type: "
+ length);
}
final int headerLength = 4;
final int limit = parentMessage.limit();
parentMessage.limit(limit + headerLength + length);
buffer.putInt(limit, length, BYTE_ORDER);
buffer.putBytes(limit + headerLength, bytes, 0, length);
return this;
}
The groupField1
field is fixed length and uses the offset from which this particular group starts at.
This is consistent with the SBE pattern for encoding fixed length fields.
In a similar way, the groupField2
and groupField3
fields are variable length fields that follow the patterns from the simple variable length messages.
They, too, make use of the parentMessage.limit() to hold state about position, and are also subject to the similar issues when applying out of order writes and reads.
Encoding the Variable Length Field after the repeating group
public MessageType3Encoder field2(final String value)
{
final byte[] bytes = (null == value || value.isEmpty()) ?
org.agrona.collections.ArrayUtil.EMPTY_BYTE_ARRAY :
value.getBytes(java.nio.charset.StandardCharsets.UTF_8);
final int length = bytes.length;
if (length > 1073741824)
{
throw new IllegalStateException("length > maxValue for type: " +
length);
}
final int headerLength = 4;
final int limit = parentMessage.limit();
parentMessage.limit(limit + headerLength + length);
buffer.putInt(limit, length, BYTE_ORDER);
buffer.putBytes(limit + headerLength, bytes, 0, length);
return this;
}
The field2
method once again follows the same pattern of using the limit() value for state.
MessageType3 Decoder
As can be seen from the encoder, the decoder for MessageType3 is in line with the other decoders we have already seen, except for the repeating group.
Recommendations
While messages composed solely of fixed-length fields can accommodate any order of reads and writes, messages containing multiple variable-length fields cannot. This inconsistency necessitates consistent order of reads and writes with the schema definition, regardless of field type.
To enforce the order in which fields are written and read, you can utilize the precedence checks feature of the SBE tool.
This approach helps prevent subtle bugs that can be challenging to identify.
Enabling precedence checks is achieved by setting sbe.generate.precedence.checks=true
in the SBE tool.
All the above examples involving an invalid order of reads or writes will fail if precedence checks are enabled.
An IllegalStateException
is raised with the message “Illegal field access order” and includes information about the offending field access.
For more details, refer to the SBE Wiki entry on Safe Flyweight Usage.
And with that, we have covered the basics of Simple Binary Encoding.
The colors used in the diagrams in this post are sourced from Jökulsárlón, a glacial lagoon in Iceland.