You want it Fast or Super Fast? – The IB, 10GbE , GigE Benchmark

With the current global financial meltdown, the ability to effectively compete becomes essential. Faster data access and sharing are critical for business success. Speed is critical for beating the competition, translating into a need for better latency.

With MW products this means the ability to push your data from the client end point into the clustered application server and its collocated IMDG. Once it has been stored within the IMDG , it can be processed , saved into a database, or distributed to other interested parties.

Common among most enterprises today are 100 Mbit/s and GigE networking infrastructure. This is the networking backbone and is responsible for the transport of data within the LAN data center.

In the past few years, new networking technologies have risen. One that has become particularly popular for high-performance has been InfiniBand and 10Gbit Ethernet .

Combining very fast networks such as InfiniBand (IB) and 10GbE with IMDG should provide the best performance with very low latency for applications distributed across multiple physical machines.

So… with the great work done by Mellanox's High-Performance Enterprise Team we took IB and 10GbE for a test drive.

The focus of the testing was to measure the latency of a very basic IMDG operation – the space write operation . The client and the space are running on different machines forcing both the client and the space server to go through the network layer when communicating with each other. The benchmark measured this simple activity using different scenarios and different permutations.

The write operation latency would be relevant almost for every application and every vertical:
– Sending an Order from the trader desk application into the Order management server
– Sending the new currency value into the Forex matching server
– Sending Market data from the feeder into the index calculation server or
– Sending the table status from a card game client into the gaming server.

The permutations matrix of the benchmark includes:
– Latency vs. different object size
– Latency vs. different write throughput
– Latency vs. different amount of concurrent users (application threads)
– Latency vs. different amount IMDG Partitions
– Latency vs. different high-availability mode – With synchronous replication to a remote backup space and without a backup
– Latency vs. different object size when running in embedded mode (no remote calls), GigE, 10GE and IB.

It is important to emphasize that we have not changed anything within the GigaSpaces API , nor with the network communication protocol. We used out of the box settings, standard OS, and without any JVM or network tuning. We simply bound the JAVA process to IB or 10GbE IP via the java.rmi.server.hostname system property.

The benchmark results showed that a system using Mellanox IB or 10GbE solutions with GigaSpaces XAP can complete a transaction of 4K data size in less than 0.4 millisecond with a total rate of 8000 transactions/sec within a highly concurrent distributed environment (4 partitions) with full data replication.

The GigaSpaces Benchmark summary is below:

– Both IB and 10GbE are better than GigE with small (4K) and large (64K) objects – the difference starts with 25 % in simple scenarios and goes up to 100 % with more complex scenarios (many partitions, many users , large objects).
– IB is a better than 10GbE with small objects and large objects
– GigaSpaces can provide below 1 ms transaction latency including sync with backup with 4K objects with large amount of concurrent users hitting the system in high update rate.
– Embedded mode latency is 10 X (0.015 ms) better than 10GE and IB with small objects and 30 X (0.02 ms) faster than 10GE and IB (without replication). These numbers include also the object creation time.
– Additional concurrent users impact the latency in very minimal manner (0.16 factor).

Basic comparison – 1 User , Small and Large object , with/out backup – 10GbE and IB normalized to GigE:


System Scalability when adding more Users:

Users scale1

users scale 2

As we can see having additional concurrent users impacts the latency in very minimal manner (0.16 factor).

System Scalability when adding more Partitions:

part scale 1

part scale 2

As we can see having additional partitions improves the latency both for small objects and large objects (0.5 factor).

Benchmark System components:
– GigaSpaces XAP 6.6.0
– GigaSpaces API: Java openspaces
– Space operation measured: write
– Sun JVM 1.6
– OS: Linux RHEL 5 Update 2
– CPU: 2 x Intel Xeon QuadCore 2.5 GHz (E5420) –  8 Core box
– HW Vendor : Supermicro 6015TW-T
– RAM Size: 16Gb
– Update rate per thread: 1K
– Client and space instances running on different physical machines.

  • flalar

    Very interesting and informative performance article. Want more on this type of topics

  • Shay Hassidim

    We will have more of these coming in the next few weeks (multi core benchmarks , content routers benchmarks , large scale web applications benchmarks). Anything special you are interested with ?