FIX Descriptor Specification

Canonicalization, SBE, and Merkle Specification for Onchain FIX Asset Descriptors

Version 1.0 · Last Updated: September 2025

🎮 First time reading this spec?

Try the interactive explorer to see each step of the transformation process in action. Visualize how FIX messages become canonical trees, SBE bytes, and Merkle commitments.

Launch Interactive Explorer →

1. Overview#

1.1 Problem Statement#

When tokenizing securities, traditional financial systems need standardized instrument data. The Financial Information eXchange (FIX) Protocol is the de facto standard for describing financial instruments in traditional markets. However, today every blockchain integration requires custom adapters and manual data mapping between token contracts and existing financial infrastructure.

1.2 Solution#

This specification defines how to embed FIX descriptors directly in token contracts using canonical SBE encoding and Merkle commitments. This enables automatic integration with existing financial infrastructure while maintaining onchain verifiability—without requiring any onchain FIX parsing.

1.3 What This Spec Covers#

Converting FIX messages to canonical trees
SBE encoding rules for deterministic representation
Merkle commitment generation for efficient field verification
Onchain storage patterns (SSTORE2-based)
Verification mechanisms for proving specific fields

1.4 What This Spec Does NOT Cover#

Which FIX fields to include (business policy decision)
Token standards (ERC20, ERC721, etc.)
Trading or settlement logic
Onchain FIX parsing (all parsing happens off-chain)

1.5 How It Works (High Level)#

Input

A FIX message (or subset) describing a financial instrument—the "asset descriptor"—using standard FIX tags and groups. Example: Symbol, SecurityID, MaturityDate, CouponRate, Parties, etc.

Processing (Off-Chain)

Build a canonical tree (deterministic structure with sorted keys)
Encode to SBE format (efficient binary representation)
Generate Merkle root committing to every field

Output (Onchain)

Minimal descriptor struct in the token contract
SBE bytes stored via SSTORE2 (gas-efficient)
Verification function: anyone can prove any field with a Merkle proof

2. Running Example: US Treasury Bond#

Throughout this specification, we'll reference this concrete example: a US Treasury Bond maturing on November 15, 2030, with a 4.25% coupon rate.

FIX Message Input#

55=USTB-2030-11-15        (Symbol)
48=US91282CEZ76           (SecurityID)
22=4                      (SecurityIDSource: ISIN)
167=TBOND                 (SecurityType)
461=DBFTFR                (CFICode)
541=20301115              (MaturityDate)
223=4.250                 (CouponRate)
15=USD                    (Currency)

454=[                     (SecurityAltID group - 2 entries)
  {455=91282CEZ7, 456=1},
  {455=US91282CEZ76, 456=4}
]

453=[                     (Parties group - 2 entries)
  {448=US_TREASURY, 447=D, 452=1},
  {448=CUSTODIAN_BANK_ABC, 447=D, 452=24}
]

Canonical Tree (JSON Representation)#

After parsing and canonicalization (sorting keys, removing session fields):

{
  15: "USD",
  22: "4",
  48: "US91282CEZ76",
  55: "USTB-2030-11-15",
  167: "TBOND",
  223: "4.250",
  453: [
    { 447: "D", 448: "US_TREASURY", 452: "1" },
    { 447: "D", 448: "CUSTODIAN_BANK_ABC", 452: "24" }
  ],
  454: [
    { 455: "91282CEZ7", 456: "1" },
    { 455: "US91282CEZ76", 456: "4" }
  ],
  461: "DBFTFR",
  541: "20301115"
}

SBE Encoding#

This tree is encoded to SBE (Simple Binary Encoding) format:

Size: Compact binary format

Format: SBE schema-driven encoding with message header and field data

Merkle Tree#

Each field becomes a Merkle leaf. Example paths:

[15] → "USD" = keccak256(SBE.encode([15]) || "USD")
[223] → "4.250" = keccak256(SBE.encode([223]) || "4.250")
[453, 0, 448] → "US_TREASURY" = keccak256(SBE.encode([453, 0, 448]) || "US_TREASURY")
[454, 1, 456] → "4" = keccak256(SBE.encode([454, 1, 456]) || "4")

All leaves are sorted and combined into a binary Merkle tree, producing a fixRoot.

Onchain Storage#

FixDescriptor {
  fixMajor: 4,
  fixMinor: 4,
  dictHash: 0x...,
  fixRoot: 0x7a3f... (Merkle root),
  fixSBEPtr: 0x123... (SSTORE2 address),
  fixSBELen: 243
}

💡 See this in the explorer

Visit the Interactive Explorer to see this exact transformation step-by-step with visualizations of the tree, SBE bytes, and Merkle structure.

3. Terminology and Notation#

Descriptor

The FIX message subset describing instrument characteristics (not transport/session data). Contains only business-relevant fields like Symbol, SecurityID, MaturityDate, CouponRate, Parties, etc.

Canonical Form

A single, deterministic representation ensuring all implementations produce identical output. Achieved through: sorted map keys, consistent encoding, and removal of optional formatting.

FIX Tag

An integer field identifier defined by the FIX Protocol (e.g., 55=Symbol, 15=Currency, 541=MaturityDate)

Group

A repeating structure in FIX introduced by a "NoXXX" count tag (e.g., 454=NoSecurityAltID, 453=NoPartyIDs). Each group contains N entries where each entry is a map of fields.

Path

The location of a field in the tree, encoded as an array of integers. Scalar fields use [tag]; group fields include the group tag, zero-based entry index, and field tag (e.g., [453, 0, 448] = first Party's PartyID)

Leaf

A (path, value) pair in the Merkle tree representing a single field. Computed as: leaf = keccak256(pathSBE || valueBytes)

SBE

Simple Binary Encoding - an efficient binary encoding format used for FIX messages. Uses schema-driven encoding with a message header and field data.

SSTORE2

A gas-efficient pattern for storing data in contract bytecode rather than storage slots. Data is deployed as the runtime bytecode of a minimal contract and retrieved via EXTCODECOPY.

Merkle Proof

A list of sibling hashes proving a specific field exists in the committed descriptor. Allows efficient verification of any field without revealing the entire descriptor.

fixRoot

The Merkle root hash committing to all fields in the descriptor. Stored onchain and used to verify field proofs.

Note: Normative keywords MUST, SHOULD, MAY are used per RFC 2119.

4. Architecture Overview#

Before diving into the detailed specifications, here's the big picture of how FIX descriptors flow from input to onchain storage:

FIX Message Input

Standard securities data (business fields)

Canonical Tree

Sort keys, remove session fields, preserve groups

SBE Encoding

Efficient binary format

Merkle Tree

Cryptographic commitment

Onchain Storage

FixDescriptor struct

SBE via SSTORE2

Merkle root

Verification function

Offchain Retrieval

Read SBE data

Onchain Verification

Prove fields with Merkle proofs

Key Components#

1. FIX Message → Canonical Tree

Parse FIX, extract business fields, build a hierarchical structure with integer keys and sorted maps.

2. Canonical Tree → SBE

Serialize to SBE (Simple Binary Encoding) - an efficient binary format designed for financial messages with schema-driven encoding.

3. Canonical Tree → Merkle Root

Enumerate all fields as (path, value) pairs, hash each to create leaves, sort, and build a binary Merkle tree.

4. Storage → Onchain

Deploy SBE data via SSTORE2, store Merkle root and metadata in a FixDescriptor struct embedded in the token contract.

5. Verification

Anyone can verify any field by providing: path, value, and Merkle proof. Contract hashes the leaf and walks the proof tree to confirm it reaches the stored root.

Why This Design?#

Canonical: Multiple implementations produce identical output
Compact: SBE is highly efficient and smaller than JSON or FIX tag=value
Verifiable: Merkle proofs allow checking any field without downloading full descriptor
Gas-efficient: SSTORE2 reduces storage costs vs traditional storage slots
No onchain parsing: All complexity happens off-chain; onchain code only verifies hashes

💡 Visualize this pipeline

The Interactive Explorer lets you step through this exact pipeline with a real Treasury bond example.

5. Descriptor Content#

5.1 Included Fields#

Business and instrument fields such as:

Identification: 48 (SecurityID), 22 (SecurityIDSource), 55 (Symbol), 454 (SecurityAltID group)
Classification: 167 (SecurityType), 461 (CFICode)
Economics/Terms: 15 (Currency), 541 (MaturityDate), 223 (CouponRate)
Roles: 453 (Parties group) and nested PartySubIDs if used

5.2 Excluded Fields#

Transport/session mechanics (e.g., 8 (BeginString), 9 (BodyLength), 10 (CheckSum), sequence numbers, admin/session fields) MUST NOT be part of the committed descriptor.

5.3 Dictionary Binding#

Implementations MUST record the FIX version and dictionary used to ensure semantic consistency across implementations:

fixMajor, fixMinor (e.g., 4, 4)
dictHash = keccak256 of the exact FIX dictionary / FIX Orchestra bytes

Example: FIX 4.4 using FIX Trading Community dictionary would have fixMajor=4, fixMinor=4, and dictHash computed from the canonical FIX dictionary file.

6. Canonical Tree Model#

The descriptor is represented as a hierarchical tree. First, let's see an example transformation, then we'll define the rules.

Example Transformation#

Input FIX (simplified):

55=USTB-2030-11-15
167=TBOND
223=4.250
453=[{448=US_TREASURY,452=1}]

Canonical Tree (JSON representation):

{
  55: "USTB-2030-11-15",
  167: "TBOND",
  223: "4.250",
  453: [
    {
      448: "US_TREASURY",
      452: "1"
    }
  ]
}

Note: Keys are integers, values are strings, arrays preserve order, and map keys are sorted.

Structure Rules#

The descriptor is represented as a hierarchical tree:

Scalars

tag → value

Value is the exact FIX value bytes interpreted as UTF-8 text

Groups

groupTag → [ entry0, entry1, … ]

Each entry is a map { tag → value | nested group }

Mandatory Rules#

Each map key is the integer FIX tag
Scalar values are text strings; do not convert numerics—preserve FIX string forms (e.g., "4.250", "20301115")
Group entries:
- MUST begin with the delimiter field (for human clarity), but map keys are still sorted
- Optional fields MAY be omitted; absence means "no leaf"
- Array order is preserved as given by the issuer (indices 0..N-1)

7. SBE Encoding#

The canonical tree is serialized using SBE (Simple Binary Encoding), an efficient binary encoding format designed for financial messages:

Schema-Driven

SBE uses an XML schema to define message structure, field types, and encoding rules

Message Header

Each encoded message starts with a standard header containing blockLength, templateId, schemaId, and version

Field Mapping

FIX tag numbers map directly to SBE field IDs in the schema (e.g., FIX tag 55 → SBE field id="55")

Efficient Encoding

Fixed-length fields use native binary types; variable-length strings use length-prefixed encoding

Runtime Generation

SBE codec classes are generated from the schema and compiled at runtime for encoding/decoding

Result: SBE produces highly compact binary encoding with excellent performance characteristics, ideal for onchain storage.

8. Merkle Commitment#

8.1 Path Encoding#

Each leaf commits to a (path, valueBytes) pair. Let's start with examples:

[15] → Simple field 15 (Currency)
[223] → Simple field 223 (CouponRate)
[453, 0, 448] → Group 453, first entry, field 448
[454, 1, 456] → Group 454, second entry, field 456
[453, 0, 802, 2, 523] → Nested group example

Path encoding rules: Each path is an array of unsigned integers, encoded using SBE. Paths are used for both Merkle leaves and verification.

8.2 Leaf Hash#

leaf = keccak256( pathSBE || valueBytes )

pathSBE: the SBE-encoded bytes of the path array
valueBytes: the exact FIX value bytes (UTF-8 string payload)

8.3 Leaf Set#

Produce one leaf per present field:

Scalars: one leaf for each (tag, value)
Each group entry: one leaf per present field inside that entry (with its path including the group index)

8.4 Root Construction#

Sort all leaves by pathSBE lexicographically (byte order)
Build a standard binary Merkle tree:
- Pair adjacent leaves; each parent = keccak256(left || right)
- If odd node remains at a level, promote it (no duplicate hashing)
The final parent is the Merkle root (fixRoot)

8.5 Proofs#

A Merkle proof is the usual vector of sibling hashes from the leaf to the root. The verifier needs:

pathSBE (bytes)
valueBytes (bytes)
siblingHashes[]: bytes32[]
directions[]: bool[]
fixRoot: bytes32

9. Onchain Representation#

9.1 Integration with Asset Contracts#

The FixDescriptor MUST be embedded directly in the asset contract (ERC20, ERC721, etc.) rather than stored in a separate registry. This eliminates permissioning issues, creates an implicit mapping from asset address to descriptor, and ensures the issuer retains full control.

9.2 Descriptor Struct#

struct FixDescriptor {
  uint16  fixMajor;           // e.g., 4
  uint16  fixMinor;           // e.g., 4
  bytes32 dictHash;           // FIX dictionary hash
  bytes32 fixRoot;            // Merkle root
  address fixSBEPtr;          // SSTORE2 data address
  uint32  fixSBELen;          // SBE length
  string  schemaURI;          // optional SBE schema URI
}

9.3 Standard Interface#

interface IFixDescriptor {
  function getFixDescriptor() external view returns (FixDescriptor memory descriptor);
  function getFixRoot() external view returns (bytes32 root);
  function verifyField(
    bytes calldata pathSBE,
    bytes calldata value,
    bytes32[] calldata proof,
    bool[] calldata directions
  ) external view returns (bool valid);
}

9.4 SBE Storage (SSTORE2 Pattern)#

The SBE data is deployed as the runtime bytecode of a minimal data contract (prefixed with a STOP byte)
Anyone can retrieve bytes via eth_getCode(fixSBEPtr)
Optionally expose a chunk retrieval function using EXTCODECOPY

9.5 Events and Versioning#

event FixDescriptorSet(bytes32 fixRoot, bytes32 dictHash, address fixSBEPtr, uint32 fixSBELen)
event FixDescriptorUpdated(bytes32 oldRoot, bytes32 newRoot, address newPtr)

10. Onchain Verification#

Library Interface#

library FixMerkleVerifier {
  function verify(
      bytes32 root,
      bytes calldata pathSBE,
      bytes calldata value,
      bytes32[] calldata proof,
      bool[] calldata directions
  ) internal pure returns (bool);
}

Verification Algorithm#

bytes32 leaf = keccak256(abi.encodePacked(pathSBE, value))
For each sibling in proof with corresponding direction in directions: if current node is right, parent = keccak256(sibling || current); if left, parent = keccak256(current || sibling)
Compare final parent to root

11. Offchain Retrieval#

Once a FIX descriptor is committed onchain, participants can retrieve the SBE-encoded data and reconstruct the original descriptor. This section specifies the retrieval interface and decoding requirements.

11.1 Retrieval Interface#

Token contracts that store FIX descriptors can expose a function to retrieve the SBE-encoded data. The retrieval function can support chunked access to accommodate large descriptors.

function getFixSBEChunk(uint256 start, uint256 size)
    external view returns (bytes memory);

Chunked Access: The function accepts start offset and size parameters, allowing callers to retrieve large SBE data in multiple transactions to manage gas costs.
Bounds Handling: Implementations should clamp the requested range to available data and return an empty bytes array if the start offset exceeds the data length.
SSTORE2 Access: Since SBE data is stored via SSTORE2 (as contract bytecode), retrieval functions should use efficient bytecode access patterns to minimize gas consumption.

11.2 SBE Decoding Requirements#

Retrieved SBE bytes are decoded using the SBE schema and generated codec classes. The SBE Lambda encoder service handles encoding and decoding operations:

SBE message header identifies the schema template and version
Field IDs in the SBE schema directly correspond to FIX tag numbers
Decoded messages reconstruct the original FIX field values
Decoders use runtime-generated codec classes for efficient decoding

11.3 FIX Message Reconstruction#

After decoding SBE data, applications can reconstruct a FIX message representation. The reconstruction process:

Traverse the descriptor tree in numeric tag order (canonically sorted)
Emit scalar fields as tag=value pairs
For group nodes, emit the group count tag followed by entry fields in sequence
Optionally add FIX session headers (BeginString, BodyLength, MsgType) and trailer (CheckSum) for display purposes

Note on Session Fields

Session fields (tags 8, 9, 10, 34, 35, 49, 52, 56) are excluded from the canonical tree and SBE encoding. Reconstructed messages may include synthetic session headers for compatibility with FIX parsers, but these should not affect the Merkle root or verification.

11.4 Use Cases#

Onchain retrieval enables several important workflows:

Transparency: Any party can audit the complete descriptor data associated with a tokenized asset without relying on off-chain sources
Interoperability: Third-party contracts can read descriptor fields to make decisions (e.g., risk assessment based on maturity date)
Verification: Off-chain systems can retrieve SBE data, enumerate leaves, and generate Merkle proofs for specific fields to be verified onchain
Compliance: Regulators or auditors can independently verify that onchain descriptors match disclosed security information

14. Security Considerations#

Trust Assumptions#

Issuer Control: The descriptor is set by the token contract issuer. There is no external authority validating the FIX data accuracy—users must trust the issuer.
Immutability vs Updates: Contracts can be designed with fixed descriptors (immutable) or updatable descriptors (governed by issuer). Both patterns are valid; the choice is a business decision.
Dictionary Hash: The dictHash ensures all parties use the same FIX dictionary. Mismatched dictionaries can lead to semantic disagreements about field meanings.

What Merkle Proofs Guarantee#

✓ Merkle proofs prove:

A specific field at a specific path has a specific value
The field is part of the canonical tree committed to
No one can present a false value without breaking the proof

✗ Merkle proofs do NOT prove:

The accuracy of the descriptor data
The completeness of the descriptor
Real-world correspondence (e.g., ISIN validity)

15. Gas Cost Analysis#

Understanding gas costs helps implementers make informed decisions about descriptor size and verification strategies.

15.1 Human-Readable Descriptor Costs#

The human-readable descriptor feature introduces additional gas considerations for deployment and usage.

One-Time Deployment Costs

ERC20 Asset token deployment: ~1,324,447 gas
ERC721 Asset token deployment: ~1,649,225 gas
Factory deployment: ~2,142,209 gas
SBE data storage (SSTORE2): ~200 gas/byte + ~32k overhead
Note: Actual costs vary by descriptor size
L2 deployment can reduce costs by 10-100x

Off-chain Usage (View Calls)

0 gasGas Cost

View functions execute locally without transactions. Unlimited free calls for web apps, analytics, and data explorers.

On-chain Usage (Contract Calls)

lowVerify field (depth 2-3): ~12,000-14,000 gas
mediumVerify field (depth 4-6): ~15,000-20,000 gas
highVerify field (depth 8-10): ~23,000-27,000 gas
Measured: depth 0=9,706 gas, depth 2=11,975, depth 4=15,372, depth 6=19,980, depth 8=23,523, depth 10=26,849. Cost scales at ~1,700 gas per proof step.

15.2 Base Operation Costs#

Deployment Costs#

SBE Storage (SSTORE2)

~200 gas per byte + ~32k deployment overhead
Measured: 57,859-59,131 gas for data contract deployment (via DataContractFactory)
Example: 243-byte descriptor ≈ 80k gas total
3-4x cheaper than traditional storage slots

Descriptor Operations

setFixDescriptor(): 24,844-141,874 gas (varies by initialization state)
getFixDescriptor(): ~14,825-14,896 gas (view function)
getFixRoot(): ~2,671-4,719 gas (view function)
verifyFieldProof(): ~7,259 gas (single leaf tree, varies with proof depth)

16. Implementation Guide#

Given a FIX descriptor message and Orchestra XML schema, follow this implementation flow:

Load Orchestra Schema & Parse FIX

Load the Orchestra XML schema defining field types and structure. Extract only business fields from the FIX message (exclude session tags - see Section 5)

Build Canonical Tree

Map scalars directly; create array of entry maps for groups (see Section 6)

Serialize to SBE

Convert Orchestra schema to SBE schema, then encode the FIX message using SBE encoding with schema-driven format (see Section 7)

Enumerate Leaves

Compute pathSBE for each present field; collect (pathSBE, valueBytes) pairs (see Section 8.1-8.3)

Compute Merkle Root

Sort leaves by pathSBE; build binary Merkle tree using keccak256 (see Section 8.4)

Deploy SBE

Deploy as SSTORE2-style data contract; return fixSBEPtr and fixSBELen (see Section 9.4)

Set Descriptor

Store in the asset contract (not a registry): fixMajor, fixMinor, dictHash, fixRoot, fixSBEPtr, fixSBELen, schemaURI (see Section 9.2)

Emit Event

Emit FixDescriptorSet event for indexing (see Section 9.5)

Produce Utilities

Build proof generator and reader tools for fetching SBE data and generating proofs off-chain

💡 Reference Implementation

This specification has a complete reference implementation available in two forms:

Interactive Explorer:

Try the web interface to see the transformation pipeline in action with live visualizations.

Open Source Code:

View the complete source code on GitHub, including:

TypeScript library (packages/fixdescriptorkit-typescript)
Solidity smart contracts (contracts/src)
Web application (apps/web)
Test suites and examples

FixDescriptorKit Specification v1.0

ERC-FIX by Nethermind

Try the Explorer View on GitHub