When using compute cluster or in memory data grid technologies like GigaSpaces, you are dealing with large number of infrastructure components. Typically there are containers (JVMs) where your service instances are running and some management component(s) that monitor and manage these service instances.

In case of GigaSpaces the service instances (referred to as Processing Unit Instances) run in Grid Service Containers and Grid Service Manager is monitoring and managing the service instances in order to maintain the SLA’s and Lookup Service acts as a registry used by each component. Each machine also runs an agent which is a process monitor that restarts any failed processes.

Shown below is an example 4 node GigaSpaces cluster,

 

To support larger clusters you have to use more machines (bigger machines can also be used but this defeats the purpose of using scale out architecture). As the cluster sizes grow you will start realizing that the operational management of cluster components is not a trivial task.

Some teams I work with maintain separate cluster instances per development environment and per team. I have run into customers who maintain 20+ separate instances of each cluster for supporting their development and testing.

Maintaining and managing an environment like this is not easy. For operational efficiency in each of the above cases, it makes sense to automate the lifecycle of clusters instead of manually operating on these clusters. Following are typical lifecycle steps,

  1. start       – Start Grid Infrastructure on all the machines and deploy the services
  2. stop       – Undeploy the services and stop the Grid infrastructure on all the machines
  3. reStart – Stop and Start the Grid infrastructure
  4. startOne – Start Grid infrastructure on one machine
  5. stopOne – Stop Grid infrastructure on one machine

Using scripts for automating the above steps is very common approach. With GigaSpaces Admin API, you have a choice of automating these steps using Java/Groovy code. Admin API provides information about the currently running components as well as lets you manage the lifecycle of individual components.

manageGrid project on GigaSpaces best practices github repository is an implementation of the grid life cycle steps using Admin API. Expected component information is passed using an input configuration file. You can easily reuse the utility across any environment by passing a different configuration file.

It also includes an example configuration for GigaSapces helloworld maven application deployed across a two node cluster.  Application includes two processing units, a processor and a feeder.

Feel free to download and experiment with it.

Programmatic Management of Clusters and Grids
sean
Solutions Architect with GigaSpaces