How Virtualized MW infrastructure works in the real world? – The operational Aspect
Last week I had an interesting "incident" that proved how much the operational aspects of a virtualized middleware platfrom GigaSpaces provide is critical for on-line systems. Here is the story:
During the maintenance activities of one of the largest wall street financial services companies we have as a customer they had to shut down some of their machines on their private cloud and bring up few new ones. The machines were running GigaSpaces In-Memory-Data-grid , pure business logic services and GigaSpaces web containers serving large amount of concurrent users. The actual physical location of the cloud was in 2 different data centers.
The IT organization was not aware the application internals, or how the application was designed and implemented. The SLA of the system is 99.999% availability – i.e. It was a must have to perform all these maintenance activities without shutting down the system.
To achieve this important functionality GigaSpaces have to provide continuous high-availability, failover, and elasticity to the application – both for the data-grid, business logic services and the web application. In fact for all these to happen they didn't had to do anything special to "survive" the massive HW changes. This is the default SLA. Once the machines were shut down, backup spaces moved into primary mode, primary spaces that were running on the machines that were shut down and moved to other machines and became backups , web-applications instances and the different stateless services moved to the new machines and continue to operate against the primary spaces.
As part of the maintenance activity the IT team wanted to adjust the topology (while the system was running) to change the activation mode of running spaces from primary mode to backup mode (to switch mode) and place web applications on different specific machines. All what they had to do was to click their mouse!
Here is how you can switch the space mode (the examples below done using the Amazon public Cloud):
1. Start the GS UI – move into the Deployed Processing Unit tab
2. Right click the mouse on the primary space you want to move into a backup
3. Select the restart menu option
4. Confirm the operation
5. Within few seconds the existing backup will move into a primary mode and the space that has been restarted will be be elected to be a backup.
Here is how you can move a web application instance from one machine to another machine:
1. Start the GS UI, move into the Hosts tab and find the machine and GSC that is running the web application instance you want to move
2. Drag the web application instance from its existing location into the another GSC running on the other machine.
3. confirm the operation
4. Within few seconds the Web application will move into its new GSC
4. That's it!
If the GigaSpaces apache load-balancer agent is running, the web layer will be adjusting itself by re-configuring the apache HTTPD load-balancer mapping (again… While the system was running) to rout http requests to the web applications instances that were moved into a new location.
With the above simple procedure the IT team managed to upgrade the cloud HW without any downtime or impact to the latency and response time the users are getting.
All the above can be executed also in fully automatic manner by using the administration API. You can call this from your application code or by creating simple Groovy scripts.