Download Android App


Alternate Blog View: Timeslide Sidebar Magazine

Wednesday, May 29, 2013

Low latency and high frequency message processing

A retail financial trading platform needs very low latency trade processing - trades have to be processed quickly (20,000+ trades/sec) because the market is moving rapidly. A retail platform adds complexity because it has to do this for lots of people. So the result is more users, with lots of trades, all of which need to be processed quickly. 

Given the shift to multi-core thinking, this kind of demanding performance would naturally suggest an explicitly concurrent programming model. A simple design could be to have multiple threads read messages from a queue, process it and update the database. This requires extensive application and database tuning. Unfortunately, due to context switching, neither desired throughput nor ordered processing can't be achieved. 

What if the desired throughput (10-100k / sec) can be achieved without writing difficult to debug concurrent code and performing expensive IO?

Alternative is to perform in-memory processing and this is what we explore here. 
A single threaded processor takes input messages sequentially (in the form of a method invocation), runs business logic on it, and emits output events. It operates entirely in-memory, there is no database or other persistent store. Keeping all data in-memory has two important benefits. Firstly it's fast - there's no database to provide slow IO to access, nor is there any transactional behavior to execute since all the processing is done sequentially. The second advantage is that it simplifies programming - there's no object/relational mapping to do. All the code can be written using Java's object model without having to make any compromises for the mapping to a database.
Note that a single threaded processor alleviates the use of locks and maintains ordered processing.
Using an in-memory structure has an important consequence - what happens if everything crashes? The heart of dealing with this is Event Sourcing - which means that the current state of the processor is entirely derivable by processing the input events. As long as the input event stream is kept in a durable store you can always recreate the current state of the business logic engine by replaying the events. So using replication, the events can be written to a file or database.
Further,  to recover quickly from failure, the system can make snapshots of the processor state and restore from the snapshots at regular intervals. 

Event Sourcing is valuable because it allows the processor to run entirely in-memory, but it has another considerable advantage for diagnostics. If some unexpected behavior occurs, the  sequence of events can be copied to development environment and replayed.

The design discussed here can easily be extended to online gaming, credit card processing or serving banner advertisements by Ad servers. An important point to note here is that for trade processing is strictly ordered (not globally) but not necessary for other use cases.

vertx.io is an interesting project employing single-thread model for asynchronous processing.