Ultra-fast market exchange core matching engine

1 20
Avatar for apache
Written by
4 years ago

Ultra-fast market exchange core matching engine based on LMAX DisruptorEclipse Collections (ex. Goldman Sachs GS Collections) and Adaptive Radix Trees.

Designed for high scalability and pauseless 24/7 operation under high-load conditions and providing low-latency responses:

  • 1M users having 3M accounts int total;

  • 100K symbols (order books);

  • 1M+ orders book operations per second;

  • less than 1ms worst wire-to-wire latency.

Single order book configuration is capable to process 5M operations per second on 10-years old hardware (Intel® Xeon® X5690) with moderate latency degradation:

rate50.0%90.0%95.0%99.0%99.9%99.99%worst125K0.6µs0.9µs1.0µs1.4µs4µs24µs41µs250K0.6µs0.9µs1.0µs1.4µs9µs27µs41µs500K0.6µs0.9µs1.0µs1.6µs14µs29µs42µs1M0.5µs0.9µs1.2µs4µs22µs31µs45µs2M0.5µs1.2µs3.9µs10µs30µs39µs60µs3M0.7µs3.6µs6.2µs15µs36µs45µs60µs4M1.0µs6.0µs9µs25µs45µs55µs70µs5M1.5µs9.5µs16µs42µs150µs170µs190µs6M5µs30µs45µs300µs500µs520µs540µs7M60µs1.3ms1.5ms1.8ms1.9ms1.9ms1.9ms

Benchmark configuration:

  • Single symbol order book.

  • 3,000,000 inbound messages are distributed as follows: 9% GTC orders, 3% IOC orders, 6% cancel commands, 82% move commands. About 6% of all messages are triggering one or more trades.

  • 1,000 active user accounts.

  • In average 1,000 limit orders are active, placed in 750 different price slots.

  • Latency results are only for risk processing and orders matching. Other stuff like network interface latency, IPC, journaling is not included.

  • Test data is not bursty, meaning constant interval between commands (0.2~8µs depending on target throughput).

  • BBO prices are not changing significantly throughout the test. No avalanche orders.

  • No coordinated omission effect for latency benchmark. Any processing delay affects measurements for next following messages.

  • GC is triggered prior/after running every benchmark cycle (3,000,000 messages).

  • RHEL 7.5, network-latency tuned profile, dual X5690 6 cores 3.47GHz, one socket isolated and tickless, spectre/meltdown protection disabled.

  • Java version 8u192, newer Java 8 versions can have a performance bug

Main features

  • HFT optimized. Priority is a limit-order-move operation mean latency (currently 0.5µs). Cancel operation takes 0.7µs, placing new order ~1.0µs;

  • In-memory working state for accounting data and order books.

  • Event-sourcing - disk journaling and journal replay support, state snapshots (serialization) and restore operations.

  • Lock-free and contention-free orders matching and risk control algorithms.

  • No floating-point arithmetic, no loss of significance is possible.

  • Matching engine and risk control operations are atomic and deterministic.

  • Pipelined multi-core processing (based on LMAX Disruptor): each CPU core is responsible for certain processing stage, user accounts shard, or symbol order books shard.

  • Two different risk processing modes (specified per symbol): direct-exchange and margin-trade.

  • Maker/taker fees (defined in quote currency units).

  • 3 implementations of matching engine: reference simple implementation ("Naive"), small order books optimized ("Fast"), scalability optimized ("Direct").

  • Testing - unit-tests, integration tests, stress tests, integrity/consistency tests.

  • Low GC pressure, objects pooling, single ring-buffer.

  • Threads affinity (requires JNA).

  • User suspend/resume operation (reduces memory consumption).

  • Core reports API (user balances, open interest).

TODOs

  • Market data feeds (full order log, L2 market data, BBO, trades).

  • Clearing and settlement.

  • FIX and REST API gateways.

  • More tests and benchmarks.

  • NUMA-aware.

How to run performance tests

3
$ 0.00
Avatar for apache
Written by
4 years ago

Comments

$ 0.00
4 years ago