Simple Binary Encoding: An Overview

TalkOctober 26, 2024 - SBE

This is written version of an interactive talk given in 2024. See the related source code examples in GitHub. The target audience was intermediate Java developers who were building systems using Aeron, and who were interested in learning more about Simple Binary Encoding.

Aeron does not enforce any specific type of encoding for the data that your application code sends, but for best performance it is generally recommended to make use of efficient binary formats, such as Simple Binary Encoding.

Simple Binary Encoding has its roots in the FIX Trading Community, and the core specifications can be found online at the FIX Trading Community site. The implementation of SBE can be found at Simple Binary Encoding.

We'll start by looking at how to generate Java code from an XML schema, and then we'll look at the generated code itself.

Generating Java Code with SBE

The Simple Binary Encoding (SBE) tool generates Java code from an XML schema. The SBE tool is a standalone JAR file that can be run from the command line or integrated into a build system.

Gradle Sample

A simple, fixed-length message type is defined in the following XML schema. Assume that it, along with some additional messages and some top level metadata, is saved in a file named schema-01.xml:

<sbe:messageSchema xmlns:sbe="http://fixprotocol.io/2016/sbe"
                   package="com.shaunlaurens.pa"
                   id="1000"
                   version="1"
                   semanticVersion="pa0.1"
                   description="Schema 1 for the PA samples, version 0.1">
    <types>
        <composite name="messageHeader"
                   description="Message identifiers and length of message root">
            <type name="blockLength" primitiveType="uint16"/>
            <type name="templateId" primitiveType="uint16"/>
            <type name="schemaId" primitiveType="uint16"/>
            <type name="version" primitiveType="uint16"/>
        </composite>
    </types>

    <sbe:message name="MessageType1" id="1"
                 description="A simple, fixed length message type">
        <field name="field1" id="1" type="int64"/>
        <field name="field2" id="2" type="int32"/>
        <field name="field3" id="3" type="int64"/>
    </sbe:message>

</sbe:messageSchema>

We can then use the SBE tool to generate Java code for this schema. The following Gradle file (note, the sample uses Kotlin Script) includes the core components to generate the Java code and compile it:

plugins {
    `java-library`
}

@Suppress("DEPRECATION")
val generatedDir = file("${buildDir}/generated/src/main/java") //1(1)
val codecGeneration = configurations.create("codecGeneration") //2(2)

dependencies {
    "codecGeneration"(libs.sbe) //3(3)
    implementation(libs.agrona)
    testImplementation(libs.bundles.testing)
}

sourceSets {
    main {
        java.srcDir(generatedDir) //4(4)
    }
}

tasks {
    task("generateCodecs", JavaExec::class) {
        group = "sbe"
        val codecsFile = "src/main/resources/schema-01.xml" //5(5)
        val xsdFile = "src/main/resources/fpl/sbe.xsd" //6(6)
        mainClass.set("uk.co.real_logic.sbe.SbeTool")
        args = listOf(codecsFile)
        inputs.files(codecsFile, xsdFile)
        outputs.dir(generatedDir) //7(7)
        classpath = codecGeneration
        systemProperties["sbe.output.dir"] = generatedDir //8 (8)
        systemProperties["sbe.validation.xsd"] = xsdFile
        systemProperties["sbe.validation.stop.on.error"] = "true"
        systemProperties["sbe.target.language"] = "Java"
    }

    compileJava {
        dependsOn("generateCodecs")
    }
}
  1. This is the directory to which the generated code is written to.
  2. This creates a code configuration in Gradle for our generated code
  3. Our codec generation task will depend on the SBE library
  4. We must tell gradle in to include the generated code in the source set
  5. The location of the schema file
  6. The location of the SBE XSD file, as used for validation
  7. The generated code is written to the generatedDir; this informs the Gradle task of the output directory
  8. The SBE tool is additionally configured to output Java code to generatedDir

Investigating Fixed Length Messages

Fixed length messages are the simplest form of message type in SBE. They are defined by a fixed number of bytes and are typically used for messages that need the highest level of performance.

We will continue to use the simple fixed-length message type we defined when generating the java code before. This message is defined in the following XML schema, schema-01.xml:

<sbe:messageSchema xmlns:sbe="http://fixprotocol.io/2016/sbe"
                   package="com.shaunlaurens.pa"
                   id="1000"
                   version="1"
                   semanticVersion="pa0.1"
                   description="Schema 1 for the PA samples, version 0.1">
    <types>
        <composite name="messageHeader"
                   description="Message identifiers and length of message root">
            <type name="blockLength" primitiveType="uint16"/>
            <type name="templateId" primitiveType="uint16"/>
            <type name="schemaId" primitiveType="uint16"/>
            <type name="version" primitiveType="uint16"/>
        </composite>
    </types>

    <sbe:message name="MessageType1" id="1"
                 description="A simple, fixed length message type">
        <field name="field1" id="1" type="int64"/>
        <field name="field2" id="2" type="int32"/>
        <field name="field3" id="3" type="int64"/>
    </sbe:message>

</sbe:messageSchema>

When run through the SBE tool, this schema file will result in the following Java code being generated:

└── src
    └── main
        └── java
            └── com
                └── shaunlaurens
                    └── pa
                        ├── MessageType1Decoder.java
                        ├── MessageType1Encoder.java
                        ├── MessageHeaderDecoder.java
                        ├── MessageHeaderEncoder.java
                        ├── MetaAttribute.java
                        └── package-info.java

Let's investigate the generated code.

Message Type 1 Encoder and Decoder

Static header information

Both the encoder and decoder classes for the message type include fixed attributes for the header data we defined in the schema.

public static final int BLOCK_LENGTH = 20;
public static final int TEMPLATE_ID = 1;
public static final int SCHEMA_ID = 1000;
public static final int SCHEMA_VERSION = 1;
public static final String SEMANTIC_VERSION = "pa0.1";

We can see:

  • BLOCK_LENGTH is the total length of the message type in bytes.
    This only includes the message fields and does not include any header information. In our case, the message type is 20 bytes long with 8 bytes for field1, 4 bytes for field2, and 8 bytes for field3.
  • TEMPLATE_ID is the unique identifier for the message type, as we defined in the schema.
  • SCHEMA_ID is the unique identifier for the schema, again, with the value set to what was provided in the schema.
  • SCHEMA_VERSION is the version of the schema.
  • SEMANTIC_VERSION is a human-readable version of the schema that we defined in the schema.

Field encoding and decoding

Message Type 1 is a simple, fixed-length message type. It includes three fields: field1, field2, and field3. Field 1 is an int64, field 2 is an int32, and field 3 is an int64. Visually, the message type content (excluding any header information), looks like this with each block representing a byte:

Message Type 1 - Byte layout

Within the generated Java code, we can see that the MessageType1Encoder and MessageType1Decoder classes use fixed byte offsets for reading and writing the data. The offsets are calculated based on the field's position in the message type along with the previous field's lengths.

...
public long field1()
{
    return buffer.getLong(offset + 0, BYTE_ORDER);
}

public int field2()
{
    return buffer.getInt(offset + 8, BYTE_ORDER);
}

public long field3()
{
    return buffer.getLong(offset + 12, BYTE_ORDER);
}
...

In much the same way, we can see the writing also uses fixed byte offsets:

public MessageType1Encoder field1(final long value)
{
    buffer.putLong(offset + 0, value, BYTE_ORDER);
    return this;
}

public MessageType1Encoder field2(final int value)
{
    buffer.putInt(offset + 8, value, BYTE_ORDER);
    return this;
}

public MessageType1Encoder field3(final long value)
{
    buffer.putLong(offset + 12, value, BYTE_ORDER);
    return this;
}

Some key points to note about this fixed-length message type:

  • None of the header information is included in the encoder and decoder data access. The header information is hard-coded into the encoder and decoder classes. We can make use of the MessageHeaderEncoder and MessageHeaderDecoder classes to encode and decode the header information separately.
  • The offsets are fixed and calculated based on the field's position in the message type. There is nothing within the message type itself that indicates the length of the fields, so if a specific encoder is placed over a buffer containing the same number of bytes, but a different message type, then the Decoder will still go ahead and read it. This starts to get interesting as we move to evolving schemas.
  • The generated code is efficient and performs well, but it is also low-level and requires careful handling to ensure correctness. In this scenario, the offsets are fixed and known at compile time, so there is no need to calculate them at runtime. This can lead to very fast encoding and decoding of messages. We are also able to read and write using these decoders in any order. This is not always the case with SBE.

Message Header Decoder and Encoder

SBE tool generates a MessageHeaderEncoder and MessageHeaderDecoder for each schema that includes the messageHeader composite type. These classes can be used to read and write data on the buffer, independently to the payload data.

In much the same as as the MessageType1 encoder and decoder, the header encoder and decoder classes use fixed byte offsets for reading and writing the data, and hard codes the header data we supplied earlier.

 <composite name="messageHeader"
                   description="Message identifiers and length of message root">
    <type name="blockLength" primitiveType="uint16"/>
    <type name="templateId" primitiveType="uint16"/>
    <type name="schemaId" primitiveType="uint16"/>
    <type name="version" primitiveType="uint16"/>
</composite>

This is reflected in the generated Java code as follows:

public int blockLength()
{
    return (buffer.getShort(offset + 0, BYTE_ORDER) & 0xFFFF);
}

public int templateId()
{
    return (buffer.getShort(offset + 2, BYTE_ORDER) & 0xFFFF);
}

public int schemaId()
{
    return (buffer.getShort(offset + 4, BYTE_ORDER) & 0xFFFF);
}

public int version()
{
    return (buffer.getShort(offset + 6, BYTE_ORDER) & 0xFFFF);
}

The usage of the & 0xFFFF is to ensure that the value is treated as an unsigned short (as requested in the schema with uint16), as Java does not have unsigned types.

public MessageHeaderEncoder blockLength(final int value)
{
    buffer.putShort(offset + 0, (short)value, BYTE_ORDER);
    return this;
}

public MessageHeaderEncoder templateId(final int value)
{
    buffer.putShort(offset + 2, (short)value, BYTE_ORDER);
    return this;
}

public MessageHeaderEncoder schemaId(final int value)
{
    buffer.putShort(offset + 4, (short)value, BYTE_ORDER);
    return this;
}

public MessageHeaderEncoder version(final int value)
{
    buffer.putShort(offset + 6, (short)value, BYTE_ORDER);
    return this;
}

We typically do not interact directly with the MessageHeaderEncoder - we can simply use wrapAndApplyHeader within the MessageType1Encoder to apply the header to the buffer.

public MessageType1Encoder wrapAndApplyHeader(
    final MutableDirectBuffer buffer, final int offset,
    final MessageHeaderEncoder headerEncoder)
{
    headerEncoder
        .wrap(buffer, offset)
        .blockLength(BLOCK_LENGTH)
        .templateId(TEMPLATE_ID)
        .schemaId(SCHEMA_ID)
        .version(SCHEMA_VERSION);

    return wrap(buffer, offset + MessageHeaderEncoder.ENCODED_LENGTH);
}

Safely consuming messages from buffers

Unless you're using SBE in a very controlled environment (for example, when you're moving data across an Agrona ringbuffer within an application), you should always make use of the headers. This will allow the decoder to correctly identify the template id, and allow the application to validate the schema id and schema version before attempting to decode the payload.

private static final UnsafeBuffer BUFFER =
        new UnsafeBuffer(ByteBuffer.allocate(256));
private static final MessageHeaderEncoder MESSAGE_HEADER_ENCODER =
        new MessageHeaderEncoder();
private static final MessageHeaderDecoder MESSAGE_HEADER_DECODER =
        new MessageHeaderDecoder();
private static final MessageType1Encoder MESSAGE_TYPE1_ENCODER =
        new MessageType1Encoder();
private static final MessageType1Decoder MESSAGE_TYPE1_DECODER =
        new MessageType1Decoder();

@Test
public void encodeDecode()
{
    final int bufferOffset = 0;

    MESSAGE_TYPE1_ENCODER.wrapAndApplyHeader(BUFFER, bufferOffset,
                    MESSAGE_HEADER_ENCODER) //1(1)
        .field1(1234L)
        .field2(4321)
        .field3(6789L); //2(2)

    MESSAGE_HEADER_DECODER.wrap(BUFFER, bufferOffset); //3(3)

    assertEquals(1, MESSAGE_HEADER_DECODER.templateId());
    assertEquals(20, MESSAGE_HEADER_DECODER.blockLength());
    assertEquals(1000, MESSAGE_HEADER_DECODER.schemaId());
    assertEquals(1, MESSAGE_TYPE1_DECODER.sbeSchemaVersion());

    MESSAGE_TYPE1_DECODER.wrapAndApplyHeader(BUFFER, 0,
            MESSAGE_HEADER_DECODER); //4(4)
    final long field1 = MESSAGE_TYPE1_DECODER.field1();
    final int field2 = MESSAGE_TYPE1_DECODER.field2();
    final long field3 = MESSAGE_TYPE1_DECODER.field3();

    assertEquals(1234L, field1);
    assertEquals(4321, field2);
    assertEquals(6789L, field3);
}
  1. We wrap and apply the header to have the header data automatically written per the hard coded field values.
  2. When writing with SBE, a best practice is to write the fields in the exact order defined in the schema. Fields not written in order for a non-fixed length message type can lead to incorrect encoding and decoding.
  3. The decoder wraps the buffer and reads the header data. This is a critical step to ensure the correct schema is used for decoding the payload. The MessageType1Decoder will raise an exception if the template id does not match the expected template id. Note that it does not check the schema id or version - this is up to the application. Note that this step is not necessary if you have certainty that the buffer contains only one type of message.
  4. Now that we are certain of the template id, we can wrap and apply the header to the decoder and read the fields in the correct order.

Investigating Variable Length Messages

Fixed-length messages are the simplest form of message type in SBE. However, there are many cases where variable-length data fields need to be sent. To address this, we will create a new schema that includes a message type with two variable-length fields. We will then investigate how the generated Java code handles these fields.

This message is defined in the following XML schema, schema-02.xml:

<sbe:messageSchema xmlns:sbe="http://fixprotocol.io/2016/sbe"
                   package="com.shaunlaurens.pa.schema2"
                   id="1001"
                   version="1"
                   semanticVersion="pa0.1"
                   description="Schema 2 for the PA samples, version 0.1">
    <types>
        <composite name="messageHeader" 
                   description="Message identifiers and length of message root">
            <type name="blockLength" primitiveType="uint16"/>
            <type name="templateId" primitiveType="uint16"/>
            <type name="schemaId" primitiveType="uint16"/>
            <type name="version" primitiveType="uint16"/>
        </composite>
        <composite name="varStringEncoding"> <!-- 1(1) -->
            <type name="length" primitiveType="uint32" maxValue="1073741824"/>
            <type name="varData" primitiveType="uint8" length="0" 
                  characterEncoding="UTF-8"/>
        </composite>
    </types>

    <sbe:message name="MessageType2" id="1" 
                 description="A message with two var length fields">
        <field name="field1" id="1" type="int64"/>
        <data name="field2" id="2" type="varStringEncoding"/> <!-- 2(2) -->
        <data name="field3" id="3" type="varStringEncoding"/> <!-- 3(3) -->
    </sbe:message>

</sbe:messageSchema>
  1. We have added a new composite type, varStringEncoding, which includes a length field and a varData field. The length field is a uint32 that defines the length of the varData field. The varData field is a uint8 that is variable length and uses UTF-8 character encoding to store strings.
  2. We have added a new field, field2, to the MessageType2 message type. This field uses the varStringEncoding composite type we defined.
  3. We have added a new field, field3, to the MessageType2 message type. This field also uses the varStringEncoding composite type we defined.

When run through the SBE tool, this schema file will result in the following Java code being generated:

└── src
    └── main
        └── java
            └── com
                └── shaunlaurens
                    └── pa
                        └── schema2
                            ├── MessageHeaderDecoder.java
                            ├── MessageHeaderEncoder.java
                            ├── MessageType2Decoder.java
                            ├── MessageType2Encoder.java
                            ├── MetaAttribute.java
                            ├── VarStringEncodingDecoder.java
                            ├── VarStringEncodingEncoder.java
                            └── package-info.java

We will focus on the generated Message Type 2 encoder and decoder code. The VarStringEncodingEncoder and VarStringEncodingDecoder classes are also generated, but are unused. The other classes are much the same as before.

Message Type 2 Encoder and Decoder

Static header information

Both the encoder and decoder classes for the message type include fixed attributes for the header data we defined in the schema. Of particular interest is the BLOCK_LENGTH, which is now only the length of the fixed length field field1.

public static final int BLOCK_LENGTH = 8;
public static final int TEMPLATE_ID = 1;
public static final int SCHEMA_ID = 1001;
public static final int SCHEMA_VERSION = 1;
public static final String SEMANTIC_VERSION = "pa0.1";

We can see:

  • BLOCK_LENGTH is actually the total length of the fixed length message fields in bytes.
    In our case, the message type has 8 bytes for field1. Other fields are not counted.
  • TEMPLATE_ID is the unique identifier for the message type, as we defined in the schema.
  • SCHEMA_ID is the unique identifier for the schema, again, with the value set to what was provided in the schema.
  • SCHEMA_VERSION is the version of the schema.
  • SEMANTIC_VERSION is a human-readable version of the schema that we defined in the schema.

Variable-length field decoding

Message Type 2 adds complexity with two variable length string fields. Visually, one might expect that the buffer content (excluding any header information), looks like this with each block representing a byte:

Message Type 2 - Byte layout

We will correct this diagram later. Within the generated Java code, we can see that the MessageType2Encoder and MessageType2Decoder classes use fixed byte offsets for reading and writing the fixed data. This is the same as messsage type 1. We will focus on the variable length fields in the MessageType2Encoder and MessageType2Decoder classes.

public String field2()
{
    final int headerLength = 4;
    final int limit = parentMessage.limit();
    final int dataLength = (int)(buffer.getInt(limit, BYTE_ORDER) 
        & 0xFFFF_FFFFL);
    parentMessage.limit(limit + headerLength + dataLength);

    if (0 == dataLength)
    {
        return "";
    }

    final byte[] tmp = new byte[dataLength];
    buffer.getBytes(limit + headerLength, tmp, 0, dataLength);

    return new String(tmp, java.nio.charset.StandardCharsets.UTF_8);
}
...
public String field3()
{
    final int headerLength = 4;
    final int limit = parentMessage.limit();
    final int dataLength = (int)(buffer.getInt(limit, BYTE_ORDER) 
        & 0xFFFF_FFFFL);
    parentMessage.limit(limit + headerLength + dataLength);

    if (0 == dataLength)
    {
        return "";
    }

    final byte[] tmp = new byte[dataLength];
    buffer.getBytes(limit + headerLength, tmp, 0, dataLength);

    return new String(tmp, java.nio.charset.StandardCharsets.UTF_8);
}

Some things of interest in these two methods:

  • There are no more fixed offsets for reading the data. Now, there is internal state that is used to track the position in the buffer, called limit.
  • we can see that it is first reading the length of the data from the buffer, then reading the data itself.
  • the parentMessage is used to limit the buffer to the correct length for reading the data
  • excluding the state held within the limit, the reads of field1 and field2 are identical - so how could it know how to read them unless read in the order written?
  • given the identical nature of the reads, it seems safe to assume that out of order reads will result in invalid data.
Out of order reads after correct order writes

Let's try a quick experiment to see what happens when we read the fields out of order:

private static final UnsafeBuffer BUFFER =
    new UnsafeBuffer(ByteBuffer.allocate(256));
private static final MessageType2Decoder MESSAGE_TYPE_2_DECODER =
    new MessageType2Decoder();
private static final MessageType2Encoder MESSAGE_TYPE_2_ENCODER =
    new MessageType2Encoder();
private static final MessageHeaderDecoder MESSAGE_HEADER_DECODER =
    new MessageHeaderDecoder();
private static final MessageHeaderEncoder MESSAGE_HEADER_ENCODER =
    new MessageHeaderEncoder();

@Test
public void testEncodingDecodingWrongOrderRead()
{
    final int bufferOffset = 0;

    MESSAGE_TYPE_2_ENCODER.wrapAndApplyHeader(BUFFER, bufferOffset, 
            MESSAGE_HEADER_ENCODER)
        .field1(1234L)
        .field2("this is string field 2") // 1(1)
        .field3("this is string field 3"); // 2(2)

    MESSAGE_TYPE_2_DECODER.wrapAndApplyHeader(BUFFER, bufferOffset,
        MESSAGE_HEADER_DECODER);
    final long field1 = MESSAGE_TYPE_2_DECODER.field1();
    final String field3 = MESSAGE_TYPE_2_DECODER.field3(); // 3(3)
    final String field2 = MESSAGE_TYPE_2_DECODER.field2(); // 4(4)

    assertEquals(1234L, field1);
    assertEquals("this is string field 3", field2); //INVALID 5(5)
    assertEquals("this is string field 2", field3); //INVALID 6(6)
}
  1. We write the fields in the correct order. String 'this is string field 2' goes to field 2
  2. We write the fields in the correct order. String 'this is string field 3' goes to field 3
  3. We read field 3 before field 2, causing the limit to be set to the wrong value
  4. We read field 2 after field 3, causing the decoder to read the wrong value
  5. Field 2 is read as field 3's value, so the test passes
  6. Field 3 is read as field 2's value, so the test passes

The above test passes - despite the asserts marked 5 and 6 being incorrect. This is because the limit is set to the wrong value by the read in 3, and the decoder continues to read the wrong data.

Variable-length field encoding

public MessageType2Encoder field2(final String value)
{
    final byte[] bytes = (null == value || value.isEmpty()) ? 
        org.agrona.collections.ArrayUtil.EMPTY_BYTE_ARRAY : 
        value.getBytes(java.nio.charset.StandardCharsets.UTF_8);

    final int length = bytes.length;
    if (length > 1073741824)
    {
        throw new IllegalStateException("length > maxValue for type: " + length);
    }

    final int headerLength = 4;
    final int limit = parentMessage.limit();
    parentMessage.limit(limit + headerLength + length);
    buffer.putInt(limit, length, BYTE_ORDER);
    buffer.putBytes(limit + headerLength, bytes, 0, length);

    return this;
}
...
public MessageType2Encoder field3(final String value)
{
    final byte[] bytes = (null == value || value.isEmpty()) ? 
        org.agrona.collections.ArrayUtil.EMPTY_BYTE_ARRAY : 
        value.getBytes(java.nio.charset.StandardCharsets.UTF_8);

    final int length = bytes.length;
    if (length > 1073741824)
    {
        throw new IllegalStateException("length > maxValue for type: " + length);
    }

    final int headerLength = 4;
    final int limit = parentMessage.limit();
    parentMessage.limit(limit + headerLength + length);
    buffer.putInt(limit, length, BYTE_ORDER);
    buffer.putBytes(limit + headerLength, bytes, 0, length);

    return this;
}

In much the same way as the decoder, the encoder uses the parentMessage to limit the buffer to the correct length for writing the data. The writing of the data length followed by the data itself is the same for both fields.

Out of order writes, correct order reads
public void testEncodingDecodingWrongOrderWrite()
{
    final int bufferOffset = 0;

    MESSAGE_TYPE_2_ENCODER.wrapAndApplyHeader(BUFFER, bufferOffset,
            MESSAGE_HEADER_ENCODER)
        .field1(1234L)
        .field3("this is field three") // WRONG ORDER (1)
        .field2("this is field two"); // WRONG ORDER (2)

    MESSAGE_TYPE_2_DECODER.wrapAndApplyHeader(BUFFER, bufferOffset,
        MESSAGE_HEADER_DECODER);
    final long field1 = MESSAGE_TYPE_2_DECODER.field1();
    final String field2 = MESSAGE_TYPE_2_DECODER.field2(); // CORRECT ORDER (3)
    final String field3 = MESSAGE_TYPE_2_DECODER.field3(); // CORRECT ORDER (4)

    assertEquals(1234L, field1);
    assertEquals("this is field three", field2); // INVALID 5 (5)
    assertEquals("this is field two", field3); // INVALID 6 (6)
}
  1. We write the fields in the wrong order. String 'this is field three' goes to field 3. This sets the limit to the wrong value
  2. We continue to write the fields in the incorrect order. String 'this is field two' goes to field 2 after we wrote field 3
  3. We read field 2 before field 3, and the decoder's limit is correctly set
  4. We read field 3 after field 2
  5. Field 2 is incorrectly read as field 3's value, so the test passes
  6. Field 3 is incorrectly read as field 2's value, so the test passes

The above test passes - despite the asserts marked 5 and 6 being incorrect. This is because the limit is set to the wrong value by the first write. The decoder behaves correctly - it is the buffer data that is invalid.

Out of order reads and writes

In case you were wondering if you happen to both write and read the fields in the wrong order, the test will pass with the correct data in the correct fields. This is despite the fact that the buffer data has the fields in the incorrect order.

@Test
public void testEncodingDecodingWrongOrderWriteAndRead()
{
    final int bufferOffset = 0;

    MESSAGE_TYPE_2_ENCODER.wrapAndApplyHeader(BUFFER, bufferOffset,
            MESSAGE_HEADER_ENCODER)
        .field1(1234L)
        .field3("this is field three") // WRONG ORDER
        .field2("this is field two"); // WRONG ORDER

    MESSAGE_TYPE_2_DECODER.wrapAndApplyHeader(BUFFER, bufferOffset,
        MESSAGE_HEADER_DECODER);
    final long field1 = MESSAGE_TYPE_2_DECODER.field1();
    final String field3 = MESSAGE_TYPE_2_DECODER.field3(); // WRONG ORDER
    final String field2 = MESSAGE_TYPE_2_DECODER.field2(); // WRONG ORDER

    assertEquals(1234L, field1);
    assertEquals("this is field two", field2); // CORRECT DATA
    assertEquals("this is field three", field3); // CORRECT DATA
}

The above test passes, though the buffer data is not in the expected order. The corrected buffer diagram for this particular message type is thus impacted by the order of writes:

Message Type 2 - Byte layout

Repeating Groups with Variable Length Fields

There are cases in which a variable length data field is needed within a group within a message. For example, if we are transmitting instrument information, we may have a group of instruments, each with a variable length instruments name.

We will create another schema, schema-03.xml, to demonstrate this.

<sbe:messageSchema xmlns:sbe="http://fixprotocol.io/2016/sbe"
                   package="com.shaunlaurens.pa.schema3"
                   id="1002"
                   version="1"
                   semanticVersion="pa0.1"
                   description="Schema 1 for the PA samples, version 0.1">
    <types>
        <composite name="messageHeader" 
                   description="Message identifiers and length of message root">
            <type name="blockLength" primitiveType="uint16"/>
            <type name="templateId" primitiveType="uint16"/>
            <type name="schemaId" primitiveType="uint16"/>
            <type name="version" primitiveType="uint16"/>
        </composite>
        <composite name="varStringEncoding">
            <type name="length" primitiveType="uint32" maxValue="1073741824"/>
            <type name="varData" primitiveType="uint8" length="0" 
                  characterEncoding="UTF-8"/>
        </composite>
        <composite name="groupSizeEncoding"
                   description="Repeating group dimensions.">  <!-- 1 (1) -->
            <type name="blockLength" primitiveType="uint16"/>
            <type name="numInGroup" primitiveType="uint16"/>
        </composite>
    </types>

    <sbe:message name="MessageType3" id="3" 
                 description="Message Type with a repeating group">
        <field name="field1" id="1" type="int64"/>
        <group name="group1" id="10"
               dimensionType="groupSizeEncoding"> <!-- 2 (2) -->
            <field name="groupField1" id="11" type="int64"/>
            <data name="groupField2" id="12" type="varStringEncoding"/>
            <data name="groupField3" id="13" type="varStringEncoding"/>
        </group>
        <data name="field2" id="2" type="varStringEncoding"/>
    </sbe:message>

</sbe:messageSchema>
  1. The group size encoding composite type is used to provide the block length and number of groups in the repeating group. This is a common pattern for repeating groups in SBE messages.
  2. We add a repeating group after the fixed length fields, but before the variable length fields.

We are again going to focus on the MessageType3 encoder and decoder generated by the SBE tool.

MessageType3 Encoder

public void wrap(final MutableDirectBuffer buffer, final int count)
{
    if (count < 0 || count > 65534)
    {
        throw new 
            IllegalArgumentException("count outside allowed range: count="
            + count);
    }

    if (buffer != this.buffer)
    {
        this.buffer = buffer;
    }

    index = 0;
    this.count = count;
    final int limit = parentMessage.limit();
    initialLimit = limit;
    parentMessage.limit(limit + HEADER_SIZE);
    buffer.putShort(limit + 0, (short)8, BYTE_ORDER); // block length (1)
    buffer.putShort(limit + 2, (short)count, BYTE_ORDER); // numInGroup (2)
} 
  1. The block length is set to 8 bytes for the repeating group. This is the sum of the length of the fixed length fields in the group.
  2. The number of items in the repeating group is set.

The wrap method is used to set up the buffer for encoding the repeating group. The wrap method is called by the parent message encoder to set up the buffer for the repeating group. The blockLength and numInGroup fields are defined in the groupSizeEncoding composite type in the schema.

Encoding the Repeating Group
public Group1Encoder groupField1(final long value)
{
    buffer.putLong(offset + 0, value, BYTE_ORDER);
    return this;
}
...
public Group1Encoder groupField2(final String value)
{
    final byte[] bytes = (null == value || value.isEmpty()) ? 
        org.agrona.collections.ArrayUtil.EMPTY_BYTE_ARRAY : 
        value.getBytes(java.nio.charset.StandardCharsets.UTF_8);

    final int length = bytes.length;
    if (length > 1073741824)
    {
        throw new IllegalStateException("length > maxValue for type: "
            + length);
    }

    final int headerLength = 4;
    final int limit = parentMessage.limit();
    parentMessage.limit(limit + headerLength + length);
    buffer.putInt(limit, length, BYTE_ORDER);
    buffer.putBytes(limit + headerLength, bytes, 0, length);

    return this;
}
...
public Group1Encoder groupField3(final String value)
{
    final byte[] bytes = (null == value || value.isEmpty()) ? 
        org.agrona.collections.ArrayUtil.EMPTY_BYTE_ARRAY : 
        value.getBytes(java.nio.charset.StandardCharsets.UTF_8);

    final int length = bytes.length;
    if (length > 1073741824)
    {
        throw new IllegalStateException("length > maxValue for type: "
            + length);
    }

    final int headerLength = 4;
    final int limit = parentMessage.limit();
    parentMessage.limit(limit + headerLength + length);
    buffer.putInt(limit, length, BYTE_ORDER);
    buffer.putBytes(limit + headerLength, bytes, 0, length);

    return this;
}

The groupField1 field is fixed length and uses the offset from which this particular group starts at. This is consistent with the SBE pattern for encoding fixed length fields. In a similar way, the groupField2 and groupField3 fields are variable length fields that follow the patterns from the simple variable length messages. They, too, make use of the parentMessage.limit() to hold state about position, and are also subject to the similar issues when applying out of order writes and reads.

Encoding the Variable Length Field after the repeating group
public MessageType3Encoder field2(final String value)
{
    final byte[] bytes = (null == value || value.isEmpty()) ?
        org.agrona.collections.ArrayUtil.EMPTY_BYTE_ARRAY :
        value.getBytes(java.nio.charset.StandardCharsets.UTF_8);

    final int length = bytes.length;
    if (length > 1073741824)
    {
        throw new IllegalStateException("length > maxValue for type: " +
            length);
    }

    final int headerLength = 4;
    final int limit = parentMessage.limit();
    parentMessage.limit(limit + headerLength + length);
    buffer.putInt(limit, length, BYTE_ORDER);
    buffer.putBytes(limit + headerLength, bytes, 0, length);

    return this;
}

The field2 method once again follows the same pattern of using the limit() value for state.

MessageType3 Decoder

As can be seen from the encoder, the decoder for MessageType3 is in line with the other decoders we have already seen, except for the repeating group.

Recommendations

While messages composed solely of fixed-length fields can accommodate any order of reads and writes, messages containing multiple variable-length fields cannot. This inconsistency necessitates consistent order of reads and writes with the schema definition, regardless of field type.

To enforce the order in which fields are written and read, you can utilize the precedence checks feature of the SBE tool. This approach helps prevent subtle bugs that can be challenging to identify. Enabling precedence checks is achieved by setting sbe.generate.precedence.checks=true in the SBE tool.

All the above examples involving an invalid order of reads or writes will fail if precedence checks are enabled. An IllegalStateException is raised with the message “Illegal field access order” and includes information about the offending field access. For more details, refer to the SBE Wiki entry on Safe Flyweight Usage.

And with that, we have covered the basics of Simple Binary Encoding.


The colors used in the diagrams in this post are sourced from Jökulsárlón, a glacial lagoon in Iceland.