And your starter for 10 is…..
Posted 5 February 2009 @ 11:53 am by Jim LiddleWhenever I speak to someone new about GigaSpaces there is always a series of questions that come from the context of the background of the person asking them. For example if it is a messaging guy then they tend to be focused around queues, interoperability, slow consumer etc. If they are a DB guy then they tend to be about integrity, partitioning, stored procedures, speed, replication etc. Why am I telling you this ? Well Nati Shalom, the GigaSpaces CTO, and I both belong to the Google Groups Cloud Computing Forum, and invariably, similar types of questions come up and they tend to get quite deep. Recently Nati answered some questions from the members that I thought would be worth reposting as they explain a lot about GigaSpaces that may well be unanswered for you. The context of the first 3 questions relate to how GigaSpaces splits large data sets, in-memory, over multiple physical machines whilst also providing HA using a Primary->Backup topology that is always kept in sync via the use of a transaction log.
Question 1) So I/O is involved at every update (Disk I/O: write to the redo log. Network roundtrip: send the redo log to the backup partition). When you say “synchronous replication”, the primary partition do a roundtrip to the backup partition on every update request ? This doesn’t seem to be very efficient.
The redo-log as an in-memory log. The write to the primary partition and redolog is done in the same process so the write to the redo-log is a pure in-memory call. It is used to address scenarios where the backup is not running. You can see the benchmark page to see the transac/sec thatcan be achieved with this model.
Another important thing to note is that this overhead is fixed i.e. it is not proportional to the size of the cluster.
Question 2) Lets say the primary (partition) finishes writing the transaction into the redo log. While sending the redo log over to the backup, it crashes. Note that the backup hasn’t applied the changes to the primary yet. Now the backup partition becomes the primary, but it has lost some updates. Correct ?
At this point this crash will throw an exception to the user as the replication to the backup is synchronous. In case of a transaction the replication will take place during commit. It will be rolled back if the transaction failed during the replication.
Question 3) How long does the background data sync typically take ? I assume this is not about syncing the delta but copying all the data from the suriving partition to the newly selected backup (which has no data at all). If the crashes happen again (to the new primary) before the background data sync complete, data lost can happen.
The time it takes for backup to synch is dependent on the amount of data that each partition holds. The load is done in parallel so its fairly fast i.e. for 1G of data it takes around 2 sec if i recall correctly. As you rightly said this process happens in the background.
Question 4) Do you plan to support (transparent?) generalized & app specific hibernate shards?
We already have a fairly advanced integration with Hibernate. The general idea is that you could use your existing data model and database and have In-memory-Data-Grid front-ending that database. Hibernate is used to map between the In-Memory data model with the one that resides in the database. See more details here.
There are few benefits that you get out of this:
1. All the mapping overhead is pushed to a background process and applies only for write/update operations, read operations are red directly from memory as Objects.
2. There are often conflicting requirements from a data-base perspective for doing aggregated reports and those that requires fast transaction processing. Having a centralized database doing the two things enforce us to come-up with some least common denominator which is often un-optimized for either cases. The decoupling of the database from the real-time part of the applications provides more flexibility on that regard i.e. optimize the memory model for transaction throughput and database for analytics.
Question 5) I’m thinking about all those “legacy” java apps that are appropriate candidates for SBA re-factoring; many face db bottlenecks, and app teams may be looking to apply data partitioning via hibernate shards (if their apps alreadyuse hibernate)
Couldn’t agree more. We conducted a fairly detailed benchmark last year taking a typical order-management applications that uses messaging to deliver business transactions, services (Session Beans) to process the incoming business transactions and obviously database to store the state. We then compared it with space-based approach which utilizes only memory resources for the messaging and data. In addition to that we partition the data/messaging and business logic into groups which are referred to as processing-units. This enabled us to reach linear scaling of the entire applications. Obviously all data
was backed in both cases in the same underlying database. Interestingly enough we used the SAME CODE for both tests and only switched the underlying implementation. The results showed that with SBA we can achieve 6 times better throughput and 10 times better latency. Its easy to see how the results would have shown even more significant difference if we would increase the scaling requirements.
You can see more details on this benchmark here . There are also few public references for some customers who already gone through this experience one of the interesting one is iPhone Activation Platform case study. As well as an online gaming reference) .
Read more...







