Network latency vs. end-to-end latency
Geva Perry wrote an excellent blog on Extreme Transactions processing on wall street.
"So basically you now have thousands and thousands of machines buying and selling stocks and other securities from other machines based on extremely complex (and automated) computer models. So it has become a latency game — low-latency, that is."
When it comes to low latency we mostly look at the networking level and how we can push data at the speed of light. We often forget that the applications that need to consume that information are a critical piece in the information distribution food-chain. It is therefore not surprising that while we have good answers to deliver data faster than ever at the networking level, enabling the applications to consume the data effectively is still a challenge.
There is a huge difference between the network latency and the end-to-end latency, i.e., how much time it took from the point the information arrived through the network until the point a trader sees it in his desktop application. Normally, the steps that are involved in the process include enrichment (turning the incoming data into a more meaningful and consistent format), filtering and distribution of the right data to the right consumer.
Decreasing end-to-end latency is the bigger challenge IMO and it can only be achieved if we provide provide an architecture that covers all aspects of the data distribution.
There are basically two things that we're doing at GigaSpaces to address the end-to-end latency challenge:
- Create a processing and data grid that will enable to store the data that arrives from the stream and maintain the current state of the market in real-time.
- Enable efficient server side filtering that will ensure that only the relevant data will be sent to the consumer through a continuous query approach.
With this approach end-to-end low-latency is achieved because we are able to reduce unnecessary network hops due to the fact that the processing is done in-memory with the data.
Another important challenge is how to keep the latency consistent during peek load events. We can scale the data and the processing at the same time through partitioning of what we refer to as Processing Units — a core piece in our Space-Based Architecture.
Since we maintain current state in a Data Grid, we're able to filter data effectively through indexing. With this approach, we can even handle slow-consumers effectively, in a way that will not affect the overall latency as is the case in many messaging based systems.
End-to-end-latency depends on lots of elements beyond pure networking, which have greater effect on latency than the network itself. It can be addressed effectively only if we apply an architecture that addresses all of these aspects.
Space Based Architecture was meant to provide such an architecture.