Cassandra on ACID


How GigaSpaces and Cassandra are being combined to deliver limitless computational and data scalability

Cassandra's tunable eventual consistency delivers great flexibility for trading off write performance for consistency. For those requiring very high performance, however, some level of EC is going to be a must. Unfortunately, a great many applications that either cannot tolerate any inconsistency (e.g. financial transactions), require transactions, or perhaps could tolerate EC and the lack of transactions but would have to be rewritten, are stuck typically putting part of their system in a relational database. This of course means that part of their architecture is still trapped in a perhaps high cost, low scalability dead end. Recently, GigaSpaces XAP in-memory data/processing grid has been combined with Cassandra to confer ACID properties, and the equivalent of native language stored procedures and triggers to big data. This powerful combination marries the two clustered technologies to produce a horizontally scalable, transactional, and consistent platform of enormous scale.

The concept is actually simple. Client applications interact with the GigaSpaces XAP platform, which has ACID properties, and stores data in memory. The XAP cluster is a distributed in memory object "database", and provides ultra-high performance reads and writes. Excluding the unlikely case where all the data in Cassandra fits in memory, the XAP cluster is run in least recently user (LRU) eviction mode. In LRU mode, XAP will evict the oldest objects when needing to store new objects. CRUD operations on the in memory data store are written asynchronously to Cassandra as they occur. Since client applications will operate against the XAP cluster (using one of the many APIs including native, JPA, and JDBC among others), they will only experience a fully consistent view of the underlying Cassandra data. Additionally, fully transactional interactions (XA compatible) are supported. Bear in mind the depiction of separate clusters is purely logical; cohabitation on the same hardware platform is no problem.

So if cache misses on GigaSpaces flow through to Cassandra, how does that protect against read consistency? It's really just a numbers game and based on timing. The lifespan in the cache of an item (assuming it isn't "touched" except for the initial write) is a function of the size of the data store and rate at which new items are added. In the most pessimistic case, the data flow resembles a queue. When the item is first written, it is at the end of the eviction queue, and as each additional item is added, it moves closer to eviction. So if M is the size of memory in object units, and R is the rate (per second) of new items being added, the lifetime L = M/R seconds. So a even a relatively small in memory store of 4GB with a sustained write rate of 1000 1k objects/second would have a worst case eviction lifespan of approximately M=4GB/1KB=4e6, R=1e3 , M/R=4e6/1e3=4e3 or 4000 seconds = 66 minutes. This means, in this case, reads will be guaranteed consistent as long as Cassandra takes less than 66 minutes to sync replicas. Clearly this is a contrived example, but conservative considering the high sustained rate and small in-memory store. Different applications/deployments will get different numbers, but given that Cassandras consistency delay is measured in milliseconds, there is a lot of room for error.

A key factor that makes this combination compelling is the XAP ability to store data as schema free "documents". While storing conventional strongly typed domain objects is long standing XAP feature, the recent addition of schema free storage means that the end to end schema evolution is possible, and makes the integration between XAP and Cassandra more natural and robust. Another key factor is queries. XAP supports SQL queries on its in memory store, and cache misses on queries will flow to the underlying Cassandra store as CQL queries, and the results seamlessly returned to applications. Finally, XAP can export remoted services (stored proc equivalent), and invoke listeners (trigger equivalent), and even host the web tier (Jetty).

This marriage of two clusters provides a complete horizontally scalable elastic solution stack, from web tier to big data and ultra-low latency interaction, all while providing a fully transactional, consistent environment for applications that need it.

Don’t miss out on DeWayne’s lightning talk at the Cassandra Summit titled Cassandra on ACID, and Uri Cohen, VP Product Management at GigaSpaces session titled Take Any App to the Cloud of your Choice.