Archive for the 'OpenSpaces' Category
Dynamic Service Deployment
October 9th, 2008I recently created an integration with SunGrid Engine. This was easy to do - requiring only that a listener be written that hears JMX events that are produced by our product. As you may know, with GigaSpaces you can create watches for the services you deploy. These watches are populated with information coming from getter methods on those services.
For example: you could have a service that exposes the method: getBacklog() that returns a long
Then in your pu.xml you set up the watch for that property:
<os-sla:sla max-instances-per-vm=”2″>
<os-sla:scale-up-policy monitor=”backlog” max-instances=”12″ high=”200″ />
<os-sla:requirements>
<os-sla:cpu high=”.65″ />
<os-sla:memory high=”.75″ />
< /os-sla:requirements>
<os-sla:monitors>
<os-sla:bean-property-monitor name=”backlog”
bean-ref=”adaptiveOrderProcessor”
property-name=”backlog”
period=”2000″ />
< /os-sla:monitors>
< /os-sla:sla>
In this example, when the value returned from the getBacklog method is over 200, it triggers a scaling event which adds another instance of the service named adaptiveOrderProcessor to the running system.
NOTE: that the scaling event effects the entire population of the ProcessingUnit of which that service is a part. To scale one service only, you must define that one service alone in its own pu.xml file.If (as in this example) that service has a limitation on the number of instances that can run in a single GSC that is expressed here:
<os-sla:sla max-instances-per-vm=”2″>and means that at some point, the available GSCs will not be enough to host all the possible instances
<os-sla:scale-up-policy monitor=”backlog” max-instances=”12″ high=”200″/>
here we state we want a max of 12 instances so we need 6 GSC instances to host them all.
What I did with SUN was to write a JMX listener that listened for “ProvisionerFailureEvents” which are created by the GSM when it gets a scaling or relocation or failover event that provokes the GSM to seek a host for that service. When the GSM cannot find a suitable host because there are not enough GSCs running, the GSM sends out a ProvisionerFailureEvent which is what my code listens for. When the event hits, my code simply calls the API of the Grid technology in question and asks for the creation of a new GSC. In other words, the service in question says: “help me, I must relocate” or, “Help me, I must failover”, or “help me, I must have more of me running on the network because one of me is not enough!” The GSM says, ” I will start you somewhere else. . .” but then the GSM says, “Oh, golly! There is no where else to start you!” And then the GSM says, “Help me someone!” and sends the ProvisionerFailureEvent to JMX hoping some force in the universe will care.Once the universe shows an interest and starts a new GSC, the GSM will retry the scaling, failover, or relocation effort and utilize that new resource allowing the declared SLA to be satisfied.
Bottom line is: it is simple to integrate GigaSpaces with any grid management solution that exposes an API and in doing that integration, enable the dynamic addition of resources to allow the relocation (or failover or scaling) of an application on the fly. Other watches you might set up include: getLocalTimeOfDay() where the value measured causes a relocation event that could move applications to new machines in different timeZones - allowing you to “follow the sun”. getMemoryConsumption() where the value measured causes a relocation event that moves applications to new GSCs that have more memory. getCPULevel() where the value measured causes a relocation event that moves applications to new GSCs with more CPU capacity. etc.. Again, the choice if scaling in response to an event or relocating is yours but is also dependent on the type of service you are affecting. If the service involved has an embedded space in the same PU.xml file, you can not scale it using scaling events. To accomplish this behavior, you must relocate the processing unit which will allow you to scale to the limits of the number of partitions you defined for that space when it was deployed. Example: you deploy a space and worker to the system and define the space as <os-sla:sla cluster-schema=”partitioned-sync2backup” number-of-instances=”24″ number-of-backups=”1″ max-instances-per-vm=”1″>This means you have 24 partitions defined. You first deploy this processing unit to 3 GSCs each having 2gb ram where you run 16 instances in each GSC (8 partitions and 8 backups)
These instances run happily until they start running low on memory. At that point a watch on one of the workers could trigger a relocation event which asks the GSM to move one of the instances of the PU to a new GSC. Presuming there is a GSC with the necessary memory and CPU available … (as defined in the following section where it is specified that the PU will not be deployed unless no more than 25% of the CPU and memory is utilized in a target GSC) <os-sla:requirements><os-sla:cpu high=”.25″/>
<os-sla:memory high=”.25″ />
</os-sla:requirements> The GSM will relocate a PU to the new GSC and with that relocation the spreading out of the information and work starts to happen. Eventually, if there are enough GSCs available, the system could span 48 GSCs each having 2gb ram so the system that started with 6gb ram and maybe 6 cores, could grow over time to be housed in 48gb ram and use 96 cores! This kind of relocation could be expanded further by moving the instances to a different class of machine. If you chose to move to 64 bit machine or to an Azul box, the possibilities are almost endless! In my opinion, this relocation is likely to be driven by human operators rather than automated rules, but the rules could be put in place as a last resort for the times when the humans are asleep at the helm. Cheers, Owen.
Edge High Performance Computing
August 27th, 2008I ran across this excellent article on Edge HPC and by golly, it sounds like the need for both of the following is growing at a huge rate:
- Increasing the efficiency of the software solutions made to solve business problems
- Providing adaptability of software systems to changes such as increased load and failures
They call this new landscape: “Edge High Performance Computing“, as opposed to “High Performance Computing“
To my mind the major difference here between these classes of problems is the need to maintain state and the capacity to leverage less-exotic hardware and more “modern” and likely Object-Oriented software languages such as .NET, C++ and Java.
I like very much the quality of the content of the Tabor research work and look forward to future updates from them regarding these topics.
Owen.
GigaSpaces XAP 6.5/6.6 new releases
August 18th, 2008GigaSpaces 6.5 was released at the end of June, and we are now working on the 6.6 release, with the first milestone already publicly available. These are major milestones in a series of upcoming releases all aimed at strengthening our…
Video is worth a million words. . . Latest GigaSpacesXAP demo
August 12th, 2008With my wife and kids away, I am having a geeky bachelor time that many who read this would probably find quite thrilling. For example: I spent this last weekend producing a new MonteCarlo application example for GigaSpaces.
It is rich in the patterns it utilizes:
- SBA (Colocated spaces with workers)
- Master-worker (remote workers which scale dynamically when the workload is too high)
- Spring Remoting on top of GigaSpaces – the aggregation work is achieved utilizing our Sync Remoting
The app goes like this:
Historical information regarding several funds is used to predict the future behavior of those funds over a 20 year time period with Minute by Minute fluctuation in prices being calculated and applied.
These funds are grouped in different formations into Portfolios and the entire Portfolio assessed for its growth over the same 20 year period.
In the end, the best combination of funds within a portfolio is determined and reported and the best and worst behaving funds are also shown.
Technically, several portfolios and their associated funds are written as tasks into the system for analysis – the logic needed to process them and their historical information is sent along with them as well. Workers take the tasks, process the associated logic and return a result showing the outcome of the application of the expected variation in price for each fund in the portfolio over the designated 20 year period.
This is performed over a set of possible portfolios – each containing different variations of funds.
Each set is repeatedly simulated to avoid undo skewing and allow for better control of the randomness.
In the end, the client requests a summary of all the results which is created by a service that operates in parallel across the available spaces to take all the results and sort them and aggregate them to determine the most common, best [highest value portfolio], and worst [lowest value portfolio]
This information is returned to the client and printed out to the console.
This morning, I produced a nifty 8 minute flash presentation showing my Monte Carlo application being executed on the GigaSpaces platform.
NB: To keep the video short, and because I only used my laptop, I kept the size of the sets and the iterations at very low levels. Nevertheless, the application scales very nicely and provides fault-tolerance as well.
You can check out the 8 minute video using this link: THE MONTECARLO VIDEO
[you may have to wait a while for the video to load, if it stops
mid-way, play it again from the beginning after it loads completely]
I hope you enjoy watching the fruits of my labors – what could be a better way to spend a bachelor weekend eh?
Cheers,
Owen.
Bridging the gap between the clouds
August 9th, 2008Dekel Tankel of GigaSpaces spoke recently to a hip cloud crowd regarding the risks associated with moving an application to the cloud or grid environment.
Without a GigaSpaces Space-based architecture (what I refer to as a TPC Architecture ) applications running in a cloud suffer from an inability to scale and start to erode the benefits of the cloud. In addition there are questions regarding the ability to maintain state reliably in a cloud: especially a pay by the drink cloud where it can get expensive both in terms of $$ and time to utilize the persistence mechanisms provided by the vendor.
Dekel speaks about and demonstrates how GigaSpaces makes it possible to manage your application services -moving them from machine to machine as needed and also addresses scaling the application by spreading it out [partitioning it while removing the traditional bottlenecks of the database and network boundaries between services]
He does a nice job – way to go Dekel!
Cheers,
Owen.







