FIX Descriptor Specification
Canonicalization, SBE, and Merkle Specification for Onchain FIX Asset Descriptors
Try the interactive explorer to see each step of the transformation process in action. Visualize how FIX messages become canonical trees, SBE bytes, and Merkle commitments.
Launch Interactive Explorer →1. Overview#
1.1 Problem Statement#
When tokenizing securities, traditional financial systems need standardized instrument data. The Financial Information eXchange (FIX) Protocol is the de facto standard for describing financial instruments in traditional markets. However, today every blockchain integration requires custom adapters and manual data mapping between token contracts and existing financial infrastructure.
1.2 Solution#
This specification defines how to embed FIX descriptors directly in token contracts using canonical SBE encoding and Merkle commitments. This enables automatic integration with existing financial infrastructure while maintaining onchain verifiability—without requiring any onchain FIX parsing.
1.3 What This Spec Covers#
- Converting FIX messages to canonical trees
- SBE encoding rules for deterministic representation
- Merkle commitment generation for efficient field verification
- Onchain storage patterns (SSTORE2-based)
- Verification mechanisms for proving specific fields
1.4 What This Spec Does NOT Cover#
- Which FIX fields to include (business policy decision)
- Token standards (ERC20, ERC721, etc.)
- Trading or settlement logic
- Onchain FIX parsing (all parsing happens off-chain)
1.5 How It Works (High Level)#
A FIX message (or subset) describing a financial instrument—the "asset descriptor"—using standard FIX tags and groups. Example: Symbol, SecurityID, MaturityDate, CouponRate, Parties, etc.
- Build a canonical tree (deterministic structure with sorted keys)
- Encode to SBE format (efficient binary representation)
- Generate Merkle root committing to every field
- Minimal descriptor struct in the token contract
- SBE bytes stored via SSTORE2 (gas-efficient)
- Verification function: anyone can prove any field with a Merkle proof
2. Running Example: US Treasury Bond#
Throughout this specification, we'll reference this concrete example: a US Treasury Bond maturing on November 15, 2030, with a 4.25% coupon rate.
FIX Message Input#
55=USTB-2030-11-15 (Symbol)
48=US91282CEZ76 (SecurityID)
22=4 (SecurityIDSource: ISIN)
167=TBOND (SecurityType)
461=DBFTFR (CFICode)
541=20301115 (MaturityDate)
223=4.250 (CouponRate)
15=USD (Currency)
454=[ (SecurityAltID group - 2 entries)
{455=91282CEZ7, 456=1},
{455=US91282CEZ76, 456=4}
]
453=[ (Parties group - 2 entries)
{448=US_TREASURY, 447=D, 452=1},
{448=CUSTODIAN_BANK_ABC, 447=D, 452=24}
]Canonical Tree (JSON Representation)#
After parsing and canonicalization (sorting keys, removing session fields):
{
15: "USD",
22: "4",
48: "US91282CEZ76",
55: "USTB-2030-11-15",
167: "TBOND",
223: "4.250",
453: [
{ 447: "D", 448: "US_TREASURY", 452: "1" },
{ 447: "D", 448: "CUSTODIAN_BANK_ABC", 452: "24" }
],
454: [
{ 455: "91282CEZ7", 456: "1" },
{ 455: "US91282CEZ76", 456: "4" }
],
461: "DBFTFR",
541: "20301115"
}SBE Encoding#
This tree is encoded to SBE (Simple Binary Encoding) format:
Merkle Tree#
Each field becomes a Merkle leaf. Example paths:
All leaves are sorted and combined into a binary Merkle tree, producing a fixRoot.
Onchain Storage#
FixDescriptor {
fixMajor: 4,
fixMinor: 4,
dictHash: 0x...,
fixRoot: 0x7a3f... (Merkle root),
fixSBEPtr: 0x123... (SSTORE2 address),
fixSBELen: 243
}Visit the Interactive Explorer to see this exact transformation step-by-step with visualizations of the tree, SBE bytes, and Merkle structure.
3. Terminology and Notation#
4. Architecture Overview#
Before diving into the detailed specifications, here's the big picture of how FIX descriptors flow from input to onchain storage:
Key Components#
Why This Design?#
- Canonical: Multiple implementations produce identical output
- Compact: SBE is highly efficient and smaller than JSON or FIX tag=value
- Verifiable: Merkle proofs allow checking any field without downloading full descriptor
- Gas-efficient: SSTORE2 reduces storage costs vs traditional storage slots
- No onchain parsing: All complexity happens off-chain; onchain code only verifies hashes
The Interactive Explorer lets you step through this exact pipeline with a real Treasury bond example.
5. Descriptor Content#
5.1 Included Fields#
Business and instrument fields such as:
- Identification: 48 (SecurityID), 22 (SecurityIDSource), 55 (Symbol), 454 (SecurityAltID group)
- Classification: 167 (SecurityType), 461 (CFICode)
- Economics/Terms: 15 (Currency), 541 (MaturityDate), 223 (CouponRate)
- Roles: 453 (Parties group) and nested PartySubIDs if used
5.2 Excluded Fields#
Transport/session mechanics (e.g., 8 (BeginString), 9 (BodyLength), 10 (CheckSum), sequence numbers, admin/session fields) MUST NOT be part of the committed descriptor.
5.3 Dictionary Binding#
Implementations MUST record the FIX version and dictionary used to ensure semantic consistency across implementations:
Example: FIX 4.4 using FIX Trading Community dictionary would have fixMajor=4, fixMinor=4, and dictHash computed from the canonical FIX dictionary file.
6. Canonical Tree Model#
The descriptor is represented as a hierarchical tree. First, let's see an example transformation, then we'll define the rules.
Example Transformation#
Input FIX (simplified):
55=USTB-2030-11-15
167=TBOND
223=4.250
453=[{448=US_TREASURY,452=1}]Canonical Tree (JSON representation):
{
55: "USTB-2030-11-15",
167: "TBOND",
223: "4.250",
453: [
{
448: "US_TREASURY",
452: "1"
}
]
}Note: Keys are integers, values are strings, arrays preserve order, and map keys are sorted.
Structure Rules#
The descriptor is represented as a hierarchical tree:
Value is the exact FIX value bytes interpreted as UTF-8 text
Each entry is a map { tag → value | nested group }
Mandatory Rules#
- Each map key is the integer FIX tag
- Scalar values are text strings; do not convert numerics—preserve FIX string forms (e.g., "4.250", "20301115")
- Group entries:
- MUST begin with the delimiter field (for human clarity), but map keys are still sorted
- Optional fields MAY be omitted; absence means "no leaf"
- Array order is preserved as given by the issuer (indices 0..N-1)
7. SBE Encoding#
The canonical tree is serialized using SBE (Simple Binary Encoding), an efficient binary encoding format designed for financial messages:
Result: SBE produces highly compact binary encoding with excellent performance characteristics, ideal for onchain storage.
8. Merkle Commitment#
8.1 Path Encoding#
Each leaf commits to a (path, valueBytes) pair. Let's start with examples:
Path encoding rules: Each path is an array of unsigned integers, encoded using SBE. Paths are used for both Merkle leaves and verification.
8.2 Leaf Hash#
pathSBE: the SBE-encoded bytes of the path arrayvalueBytes: the exact FIX value bytes (UTF-8 string payload)
8.3 Leaf Set#
Produce one leaf per present field:
- Scalars: one leaf for each (tag, value)
- Each group entry: one leaf per present field inside that entry (with its path including the group index)
8.4 Root Construction#
- Sort all leaves by pathSBE lexicographically (byte order)
- Build a standard binary Merkle tree:
- Pair adjacent leaves; each parent = keccak256(left || right)
- If odd node remains at a level, promote it (no duplicate hashing)
- The final parent is the Merkle root (fixRoot)
8.5 Proofs#
A Merkle proof is the usual vector of sibling hashes from the leaf to the root. The verifier needs:
9. Onchain Representation#
9.1 Integration with Asset Contracts#
The FixDescriptor MUST be embedded directly in the asset contract (ERC20, ERC721, etc.) rather than stored in a separate registry. This eliminates permissioning issues, creates an implicit mapping from asset address to descriptor, and ensures the issuer retains full control.
9.2 Descriptor Struct#
struct FixDescriptor {
uint16 fixMajor; // e.g., 4
uint16 fixMinor; // e.g., 4
bytes32 dictHash; // FIX dictionary hash
bytes32 fixRoot; // Merkle root
address fixSBEPtr; // SSTORE2 data address
uint32 fixSBELen; // SBE length
string schemaURI; // optional SBE schema URI
}9.3 Standard Interface#
interface IFixDescriptor {
function getFixDescriptor() external view returns (FixDescriptor memory descriptor);
function getFixRoot() external view returns (bytes32 root);
function verifyField(
bytes calldata pathSBE,
bytes calldata value,
bytes32[] calldata proof,
bool[] calldata directions
) external view returns (bool valid);
}9.4 SBE Storage (SSTORE2 Pattern)#
- The SBE data is deployed as the runtime bytecode of a minimal data contract (prefixed with a STOP byte)
- Anyone can retrieve bytes via
eth_getCode(fixSBEPtr) - Optionally expose a chunk retrieval function using EXTCODECOPY
9.5 Events and Versioning#
10. Onchain Verification#
Library Interface#
library FixMerkleVerifier {
function verify(
bytes32 root,
bytes calldata pathSBE,
bytes calldata value,
bytes32[] calldata proof,
bool[] calldata directions
) internal pure returns (bool);
}Verification Algorithm#
bytes32 leaf = keccak256(abi.encodePacked(pathSBE, value))- For each sibling in
proofwith corresponding direction indirections: if current node is right, parent = keccak256(sibling || current); if left, parent = keccak256(current || sibling) - Compare final parent to root
11. Offchain Retrieval#
Once a FIX descriptor is committed onchain, participants can retrieve the SBE-encoded data and reconstruct the original descriptor. This section specifies the retrieval interface and decoding requirements.
11.1 Retrieval Interface#
Token contracts that store FIX descriptors can expose a function to retrieve the SBE-encoded data. The retrieval function can support chunked access to accommodate large descriptors.
function getFixSBEChunk(uint256 start, uint256 size)
external view returns (bytes memory);- Chunked Access: The function accepts
startoffset andsizeparameters, allowing callers to retrieve large SBE data in multiple transactions to manage gas costs. - Bounds Handling: Implementations should clamp the requested range to available data and return an empty bytes array if the start offset exceeds the data length.
- SSTORE2 Access: Since SBE data is stored via SSTORE2 (as contract bytecode), retrieval functions should use efficient bytecode access patterns to minimize gas consumption.
11.2 SBE Decoding Requirements#
Retrieved SBE bytes are decoded using the SBE schema and generated codec classes. The SBE Lambda encoder service handles encoding and decoding operations:
- SBE message header identifies the schema template and version
- Field IDs in the SBE schema directly correspond to FIX tag numbers
- Decoded messages reconstruct the original FIX field values
- Decoders use runtime-generated codec classes for efficient decoding
11.3 FIX Message Reconstruction#
After decoding SBE data, applications can reconstruct a FIX message representation. The reconstruction process:
- Traverse the descriptor tree in numeric tag order (canonically sorted)
- Emit scalar fields as
tag=valuepairs - For group nodes, emit the group count tag followed by entry fields in sequence
- Optionally add FIX session headers (BeginString, BodyLength, MsgType) and trailer (CheckSum) for display purposes
Note on Session Fields
Session fields (tags 8, 9, 10, 34, 35, 49, 52, 56) are excluded from the canonical tree and SBE encoding. Reconstructed messages may include synthetic session headers for compatibility with FIX parsers, but these should not affect the Merkle root or verification.
11.4 Use Cases#
Onchain retrieval enables several important workflows:
- Transparency: Any party can audit the complete descriptor data associated with a tokenized asset without relying on off-chain sources
- Interoperability: Third-party contracts can read descriptor fields to make decisions (e.g., risk assessment based on maturity date)
- Verification: Off-chain systems can retrieve SBE data, enumerate leaves, and generate Merkle proofs for specific fields to be verified onchain
- Compliance: Regulators or auditors can independently verify that onchain descriptors match disclosed security information
14. Security Considerations#
Trust Assumptions#
- Issuer Control: The descriptor is set by the token contract issuer. There is no external authority validating the FIX data accuracy—users must trust the issuer.
- Immutability vs Updates: Contracts can be designed with fixed descriptors (immutable) or updatable descriptors (governed by issuer). Both patterns are valid; the choice is a business decision.
- Dictionary Hash: The dictHash ensures all parties use the same FIX dictionary. Mismatched dictionaries can lead to semantic disagreements about field meanings.
What Merkle Proofs Guarantee#
✓ Merkle proofs prove:
- A specific field at a specific path has a specific value
- The field is part of the canonical tree committed to
- No one can present a false value without breaking the proof
✗ Merkle proofs do NOT prove:
- The accuracy of the descriptor data
- The completeness of the descriptor
- Real-world correspondence (e.g., ISIN validity)
15. Gas Cost Analysis#
Understanding gas costs helps implementers make informed decisions about descriptor size and verification strategies.
15.1 Human-Readable Descriptor Costs#
The human-readable descriptor feature introduces additional gas considerations for deployment and usage.
15.2 Base Operation Costs#
Deployment Costs#
- ~200 gas per byte + ~32k deployment overhead
- Measured: 57,859-59,131 gas for data contract deployment (via DataContractFactory)
- Example: 243-byte descriptor ≈ 80k gas total
- 3-4x cheaper than traditional storage slots
- setFixDescriptor(): 24,844-141,874 gas (varies by initialization state)
- getFixDescriptor(): ~14,825-14,896 gas (view function)
- getFixRoot(): ~2,671-4,719 gas (view function)
- verifyFieldProof(): ~7,259 gas (single leaf tree, varies with proof depth)
16. Implementation Guide#
Given a FIX descriptor message and Orchestra XML schema, follow this implementation flow:
This specification has a complete reference implementation available in two forms:
- TypeScript library (packages/fixdescriptorkit-typescript)
- Solidity smart contracts (contracts/src)
- Web application (apps/web)
- Test suites and examples