Low latency (capital markets)

Introduction

'Low latency' is a hot topic within capital markets, where the proliferation of algorithmic trading requires firms to react to market events faster than the competition to increase profitability of trades. For example, when executing arbitrage strategies the opportunity to “arb” the market may only present itself for a few milliseconds before parity is achieved. It is usually the first party to take advantage of the situation which will bring the market back into parity and therefore the only party to profit from the situation. To demonstrate the value that clients put on latency, a large global investment bank has stated that every millisecond lost results in $100m per annum in lost opportunity ^[1].

What is considered “low” is therefore relative but also a self-fulfilling prophecy. Many organisations are using the words “ultra low latency” to describe latencies of under 1 millisecond, but really what is considered low today will no doubt be considered unacceptable in a few years time.

We cannot discuss latency without mentioning throughput. Data rates are increasing exponentially which has a direct relation to the speed in which messages can be processed and low-latency systems need not only to be able to get a message from A to B as quickly as possible but also to be able to do this for millions of messages per second.

Where latency occurs

Latency from event to execution

When talking about latency in the context of capital markets consider the round trip between event and trade:

Event occurs at a particular venue
Information about that event is placed in a message on the wire
Message reaches the decision making application
Application makes a trade decision based upon that event
Order sent to the trading venue
Venue executes the order
Order confirmation sent back to application

We also need to consider how latency is assembled in this chain of events:

Processing, the time taken to process a message (which could be as simple as a network switch routing a packet)
Propagation, the time taken for a bit of data to get from A to B (limited by the speed of light)
Packet size(divided by)Bandwidth, total message size (payload + headers), available bandwidth, number of messages being sent across the link.

So let’s delve deeper into the round trip and how latency could be added:

Event occurrence to being on the wire

The systems at a particular venue need to handle events, such as order placement, and get them onto the wire as quickly as possible in order to be competitive within the market place. Some venues offer premium services for clients needing the quickest solutions.

Exchange to Application

This is one of the areas where most delay can be added, due to the distances involved, amount of processing by internal routing engines, hand off between different networks and the sheer amount of data which is being sent, received and processed from various data venues. A Dec 2007 poll by low-latency.com indicates that 38 percent of respondents found this area to add the most latency to their systems ^[2].

Propagation between the location of the execution venue and the location of the application
Delays in data aggregation networks such as Reuters IDN
Propagation within internal networks
Processing within internal networks
Processing by internal routing systems
Bandwidth of extranet and internal networks
Message packet sizes
Amount of data being sent and received

Application decision making

This area isn’t really put under the umbrella of “low-latency”, rather it is the ability of the trading firm to take advantage of High Performance Computing technologies. However, it is included for completeness.

Processing by APIs
Processing by Applications
Propagation between internal systems
Network processing/bandwidth/packet size/propagation between internal systems

Sending the order to the venue

Similar to the delays between Exchange and Application, many trades will involve a broking firm and the competitiveness of the broking firm in many cases is directly related to the performance of their order placement and management systems.

Processing by internal order management systems
Processing by Broker systems
Propagation between Application and Broker
Propagation between Broken and execution venue

Order execution

The amount of time it takes for the execution venue to process and match the order

Latency Measurement

Terminology

Average Latency

Average latency is the mean average time for a message to be passed from one point to another - the lower the better. Times under 1 millisecond are typical for a market data system.

Latency Jitter

There are many use cases where predictability of latency in message delivery is as, if not more important than a low average latency. This latency predictability is also referred to as Low Latency Jitter and describes a narrow deviation of latencies around the mean latency measurement.

Throughput

Throughput refers to the number of messages being received sent and processed by the system and is usually measured in updates per second. Throughput has a correlation to latency measurements and typically as the message rate increases so do the latency figures. To give an indication of the number of messages we are dealing with the “Options Price Reporting Authority” (OPRA) is predicting peak message rates of 907,000 updates per second (ups) on its network by July 2008^[3]. This is just a single venue – most firms will be taking updates from several venues.

Testing Procedure Nuances

Timestamping/Clocks

Clock accuracy is paramount when testing the latency between systems. Any discrepancies will give inaccurate results. Many tests involve locating the publishing node and the receiving node on the same machine to ensure the same clock time is being used. This isn’t always possible however, so clocks on different machines need to be kept in sync using some sort of time protocol:

NTP is limited to milliseconds, so is not accurate enough for today’s low-latency applications
CDMA time accuracy is in 10s of microseconds. It is US based only. Accuracy is affected by the distance from the transmission source.
GPS is the most accurate time protocol in terms of synchronisation. It is, however, the most expensive.

Reducing Latency in the Order Chain

Reducing latency in the order chain involves attacking the problem from many angles. Amdahl’s Law, commonly used to calculate performance gains of throwing more CPUs at a problem, can be applied more generally to improving latency – that is, improving a portion of a system which is already fairly inconsequential (with respect to latency) will result in minimal improvement in the overall performance.

Here are some of the improvements which firms are using to improve latency in their system.

Direct Access to the Market

Direct Feeds

In traditional deployments market data is received via a third party, such as Reuters, who aggregate data from different sources around the globe, normalise it and send that via a single channel to the bank. Clearly this isn’t the most efficient way of receiving market data in a low latency application as delays are added through the third party networks. Many firms are now installing Direct Feeds, a direct connection between the source of data and the firm’s premises. One issue with installing direct feeds is the non-standard way in which data is formatted from venues and the frequency of changes made by the venue to its data stream format. To address this many market data application vendors are selling Direct Feed Handlers which take data from a particular source and perform some sort of normalisation on the data whilst offering software updates and management to support any changes to the feeds from the venue. Reuters Data Feed Direct, for example, is a remotely managed direct data feed service.

Direct Market Access (DMA)

Direct Market Access allows firms to place orders with a trading venue, bypassing a broking firm and, therefore, removing any delays associated with broker systems.

“Getting Closer”

Proximity Hosting

Propagation has a strictly defined limit, that is, the speed of light. Clearly then the only way to reduce time added by propagation is to move two things closer together. The venue in most (if not all) cases is fixed, so firms are looking to host their trading applications much closer to venues such as the New York Stock Exchange, or the Chicago Mercantile Exchange. Real estate near stock exchanges is expensive for this reason and firms who offer what is known as “Proximity Hosting” (Reuters Hosting Solution in conjunction with BT Radianz, for example) offer a premium service for clients who need to reduce those extra few milliseconds.

Co-location

Some exchanges are also offering co-location, that is, hosting for applications in the same data centres as the systems generating the message feeds and taking orders.

Tweaks

Network Tweaks

Removing firewalls

Firewalls add processing time to message delivery. Removing firewalls can reduce delays at the expense of data security. The latest firewalls, such as those in the Cisco ASA 5580 and 6500 Series’, offer latency claims as low as 30 microseconds so upgrade of firewalls should be considered before complete removal.

Protocol Tweaks

Messaging formats

Careful use and selection of message formats can reduce latency, particularly in high-throughput systems. For example, string-based messaging formats, whilst simple to code for, can be very expensive to process. Imagine your application is only interested in the final field in the messaging string – it must parse the entire string in order to extract the information required. Using fixed length fields can help, but at the expense of sending more data than is required in the form of buffers.

Binary messaging formats using native data types, whilst more complex to code for, are more efficient to process.

Selective Information Transmission

Sending entire messages to update a single field is clearly inefficient and protocols designed for low-latency/high throughput systems should only send information that has changes since the previous send.

Removing encryption

Encryption and decryption can be very processor intensive and removing encryption from message streams can save a significant amount of processing delays. However this opens up issues around security and compliance, particularly when such a policy is extended to things such as FIX-based order messages. See “Risk/Reward” for more discussion.

Compression

The decision to use compression or not is a trade-off between processing time to compress and uncompress messages versus packet size and bandwidth available. If the network is a bottleneck then compression could be advantageous, as long as the processing time required by the compression algorithm is less than the savings made through compression. In fact, there may not be a choice due to the sheer amount of data going across networks pushing the bandwidth available and leading to dropped packets and retransmissions.

Technical Product Solutions

Network Fabrics

Many firms are looking to upgrade their Gigabit Ethernet interconnects to either 10Gb Ethernet or InfiniBand networks. InfiniBand is a point-to-point, bidirectional serial link and uses a switched fabric topology, as opposed to a hierarchical switched network like Ethernet. InfiniBand vendors, such as Voltaire and Cisco, have written drivers which replace the TCP/IP stack within the Operating System allowing applications developed for TCP/IP to operate transparently over InfiniBand.

WAN Technologies

Organisations, such as BT Radianz, offer premium networks for “Ultra low latency access for Trade Execution and Market Data”. [Radianz Ultra Access] is a “patent-pending design that utilizes common access and logical VLANs configured to each venue as a “cut-through” virtual connection using dedicated fiber that requires minimal network equipment and no routing decisions”^[4]

Server Technologies

Serially switched, point-to-point buses such as PCIe are now being used in high end server in place of parallel shared buses PCIX. These have the advantage of higher bandwidth (parallel buses start to fail at higher bandwidths) and removing bus contention.

Processing

Multi-Core Processors

Dual- and Quad-Core CPUs are now the norm in server technology. Applications need to be re-written to take advantage of these multiple cores through careful use of threading models. Intel offer a service to help financial institutions optimise their applications for their dual-core and quad-core processors.

FPGA

Field Programmable Gate Array chips, installed alongside general CPUs, are being increasingly used to develop application specific hardware acceleration. Companies such as Exegy have achieved less than 80 milliseconds latency at 2 million messages per second using FPGA technology. Such technology is yet to reach the mainstream market, although expect that to change in the near future.

GPU

Graphics Processing Units, found in Graphics Cards, are increasingly being used due to their for vector algorithm processing capabilities. Nvidia have developed Tesla, which offers HPC developers access to their GPUs in their applications.

Market Data Systems

Whilst firms may wish to connect their applications directly to the direct feed handlers this doesn’t scale particularly well and does not present a resilient or cost-effective solution. Most clients will use a Market Data System which offers, amongst other things; consistent access to data through a single API, application resilience, system scalability, WAN distribution, differing qualities of service, data permissioning and reporting. The latest versions of Market Data System, such as Reuters RMDS 6, have been designed around low-latency and high throughput of data through use of improved message formats to reduce the packet size of messages going through the system and inter-operability with improved multicast protocols.

Context Aware Networking

Context aware networking is being used to move intelligent routing from software into hardware.

Databases

Writing to and reading from databases, and hard drive-based storage in general, is notoriously slow and can be a significant cause of latency as applications are waiting for database operations to complete.

In-memory databases

In-memory databases do exactly as the name suggests. Instead of the data residing on disk the data resides in memory which is a much faster medium to access.

Removing database transactions from the critical path

Real-time database transactions may not be critical to the operation of an application and removing any database transactions from the critical execution path, through careful use of threading, or “look-ahead” type code, for example) means that applications don’t have to wait for database operations to complete before continuing.

References

[1] Wall Street's Quest To Process Data At The Speed Of Light http://www.informationweek.com

[2] The Blog of James http://www.low-latency.com/

[3] ttp://en.wikipedia.org/wiki/Options_Price_Reporting_Authority#Messages_per_Second wikipedia

[4] [1]

[1]

[2]

[3]

[4]