Terabyte Elastic Cache clusters on Cisco UCS and Amazon EC2

Overview

Last week I was working on a new opportunity. The prospect needs to store 1 Terabyte of data in memory to address scalability challenges and were interested in using GigaSpaces. I was tasked with creating a demonstration of this and want to share my experience as a blog post.

Scope

The project also needed to demonstrate high-availability and full lifecycle aspects of any data grid used for a mission-critical application. This application serves millions of simultaneous users worldwide. Loading the cache with data, read throughput (max reads/second), write throughput (max writes/second) and automatic failover were most important aspects.

The first issue I had to address was hardware. The concern was that I only had about a week to deliver the demo, and I needed machines with over 1 terabyte of available memory. We have some lab machines that I could hobble together to come up with a terabyte of memory, but these are used for internal tests by our R&D and QA teams and getting these exclusively for the demo might not be an option. An Amazon EC2 cluster was the next best option that seemed viable. We also reached out to our contacts in Cisco asking for help.

The very first requirement of this demo is the need to run on any deployment environment given that where it would run was still an open question. This capability includes being able to develop on a laptop and then deploy to the actual demo hardware without any code changes.

Cloudify

For addressing the dynamic deployment issue, luckily we have Cloudify, which does exactly that. It makes any application agnostic to the underlying deployment environment. With Cloudify, the same application can be deployed to a cloud infrastructure, a non-cloud infrastructure or even a local machine without any changes to code. Deploying, managing and monitoring an application on any of this environment becomes very easy, and the look and feel of the management and monitoring tools remains consistent.

Hardware

EC2

For the EC2 version, I used the stock Amazon Linux image on the largest configuration available, which is the High-Memory Quadruple Extra Large Instance. Since these instances come with 68 GB of memory, we needed 16 machines for a 1 Terabyte cluster. Our prospect also wanted to see hot backup and automatic failover, so that required another 16 machines, for 2 Terabytes in total. Running these 32 machine instances on EC2 costs $64 per hour, which is pretty cheap for a demo or development environment.

Cisco UCS

Cisco was kind enough to lend us a C260 UCS Server, (referred to as UCS below) for our tests. The UCS offers large memory capacity and 40 ultra-fast cores, so they are an ideal platform for memory-bound and CPU-intensive applications. These were exactly the type of conditions that the demo is subject to and this server was the ideal hardware we needed. Since UCS machines have 1 Terabyte of memory and retail for about $40,000, they are also very attractive for production environments.

Cloudify Recipe

An application deployed using Cloudify has to define Application and Service Recipes, which are Groovy scripts. The Application Recipe describes the application components, which are its services and their dependencies. The Service Recipe describes service characteristics, such as the number of service instances, lifecycle events, scripts that handle these events, monitoring configurations, scaling rules and custom alert rules. More information on Recipe and how Cloudify uses it is described here.

The Application Recipe (datagrid-application.groovy) consists of a single service:

application {

name="datagrid-application"

service {

name = "datagrid-space"

}

}

This service is an Elastic Cache Service (Data Grid) and the following Application Recipe describes the EC2 version of it:

service {

icon "icon.png"

name "datagrid-space"

statefulProcessingUnit {

binaries "datagrid-space" //can be a folder, jar or a war file  

sla {

memoryCapacity 2048000

maxMemoryCapacity 2048000

highlyAvailable true

memoryCapacityPerContainer 64000

}

}

}

For the UCS version, the application code remains the same and only the memory capacity settings are adjusted to the available memory on the machine (1000000 MB).

Cloud Driver

The cloud driver acts as the specification for the new machines that Cloudify provisions. Cloudify spins up new machines when an application is deployed, scales out or on failure of a machine. The cloud driver for EC2 is configured as follow:

cloud {

provider "aws-ec2"

user "YOUR_EC2_ACCESS_KEY_ID"

apiKey "YOUR_EC2_SECRET_ACCESS_KEY_ID"

// relative path to gigaspaces directory

localDirectory "tools/cli/plugins/esc/ec2/upload"

remoteDirectory "/home/ec2-user/gs-files"

imageId "us-east-1/ami-1b814f72"

machineMemoryMB "68100"

hardwareId "m2.4xlarge"

// Security group which has the appropriate ports configured to be open for incoming and outgoing traffic

securityGroup "default"

// YOUR keypair file and name of the keypair

keyFile "cloud-demo.pem"

keyPair "cloud-demo"

// S3 URL location where GigaSpaces is saved. Update the access properties of this location to everyone

cloudifyUrl "https://s3.amazonaws.com/cloudify/gigaspaces.zip"

machineNamePrefix "gs_esm_gsa_"

dedicatedManagementMachines true

managementOnlyFiles ([])

connectedToPrivateIp false

sshLoggingLevel java.util.logging.Level.WARNING

managementGroup "management_machine"

numberOfManagementMachines 2

zones (["agent"])

reservedMemoryCapacityPerMachineInMB 1024

}

No cloud driver was needed for the UCS version because it ran on existing machines and no hardware provisioning was needed.

Deployment and monitoring

Starting the Cloudify infrastructure is straightforward: Log in to the Cloudify shell and run either “bootstrap-localcloud” for starting a local cloud or “bootstrap-cloud ec2” for starting EC2 cloud. This starts the GigaSpaces management infrastructure which includes a GSA, GSM, LUS, web-ui and a rest service.

For EC2, this bootstrap process takes about 2-3 minutes. This includes time to provision the new machines on EC2, copy GigaSpaces software and start the processes listed above.

For UCS the bootstrapping was much faster and took less than 1 minute.

Once the management infrastructure is ready, the application can be deployed. This is done using the “install-application” command with an argument for the location of the Application Recipe folder. It’s the same command for both EC2 and UCS versions of the demo.

EC2 deployment took about 10 minutes. In this time, 32 new machines were provisioned on EC2, GigaSpaces software was copied, GigaSpaces agent processes were started, GSC’s were started and the application was deployed across all the machines.

Voilà, a 2TB cluster was up and running!

Deployment of the UCS version was just as easy, and the cluster was up and running even faster as no machine provisioning is involved here.

Performance

Other objective of the demo was to show performance numbers that meet the application requirements. This was the easy part.

We were able to demonstrate easily the peak load requirements of the application and how either of the clusters can keep up with their expected loads.

In both the environments (Cisco UCS and EC2) we had very good results – initial load task on EC2 managed to load 500,000 objects per second (1k size). During the initial load all the machines consumed 99% of their CPU capacity. Based on the initial load throughput numbers we saw when loading 320 million objects, it can be projected that 1 Billion objects can be loaded to the cluster in around 36 minutes (if cluster had enough memory to hold these objects as pointed by Michal Frajt in comments below). Objects were loaded into both the primary and backup which was running on a different VM (on EC2 also a different machine). The data was synthetic account data generated during the load (not loaded from a database).

In read test we had single client with 50 threads performing a read operation based on a random key. The data grid handled 10,000 read per second when the client used sleep of 4 milliseconds after each read and over 100,000 reads per second without any sleep (2000 reads per second per client thread). During read tests we were also running a writer client which was creating new objects at the rate of 2000 objects per second. Grid nodes on EC2 consumed about 5% CPU during this test. We can project that the grid capacity is about 2 million reads per second from remote clients.

We also tested reads using local cache (aka near cache). With local cache enabled, test client managed to read data at the rate of 5 million objects per second (with local cache size of 1 million objects). As client caches recently read data in local JVM, it avoids remote calls, improving performance dramatically. During local cache tests, client machine consumed 80% CPU as the data was being served from the local cache.

Failover

GigaSpaces + Cloudify make the automatic failover in cloud environments a reality. GigaSpaces detects machine failures and automatically provisions new machines to meet the application SLA (which can be available memory or CPU cores).

To demonstrate this, we simulated a machine failure using the AWS console (“Terminate” function) and then watched as the application automatically recovered from this event by spinning up a new machine. This all occurred transparently and with no performance impact to the clients.

Conclusion

As applications have to manage and manipulate more data (thanks to Big Data and the analytics that can be unearthed out of larger and larger datasets), using in-memory access greatly helps to speed things up. Using GigaSpaces you can manage Terabytes of data across any number of machines, and on any platform.

For applications that have to work with heavy, constant loads, using a Cisco UCS Server infrastructure is a perfect fit.

For applications that only have to work with these large data set occasionally, using a Amazon EC2 infrastructure (or any other cloud provider like RackSpace) is a really good option.

You can download the datagrid recipes I used from our github repository.

Please contact me using comments if you are looking for the demo source code.

Updated 1/5/12Performance section was updated to clarify.

Updated 2/6/12 – Cloudify Recipes moved to github repository.

Facebook Twitter Linkedin Reddit Buzz Email
Posted in Benchmarks, Big Data, Caching, Cloud, Data Grid, GigaSpaces, Web UI | 9 Comments

Architecting Massively-Scalable Near-Real-Time Risk Analysis Solutions

Recently I held a webinar around architecting solutions for scalable and near-real-time risk analysis solutions based on the experience gathered with our Financial Services customers. In the webinar I also had the honor of hosting Mr. Larry Mitchel, a leading expert in the Financial Services industry, who provided background on the Risk Management domain. Following the general interest in the webinar, I decided to dedicate a post to the subject.

What goes on in the Risk Management domain?

The Finance world continually undergoes changes driven for the most part by the lessons learned from the 2008 financial crash, in an attempt to prevent such catastrophes from reoccurring. Regulations such as Dodd-Frank, EMIR, and Basel III have further formalized it, imposing tighter control and supervision. We see financial institutions addressing these conformance goals by assigning dedicated projects with dedicated budgets (which means more work for solutions architects, lucky me). One of the aspects of this conformance is reducing the risk by shortening the settlement cycles to near-real-time, as seen by initiatives such as Straight-Through Processing.

Traditional architectures, new challenges

Conforming to the new regulations mandates an entirely different approach to risk analysis. This means that the old systems, which relied on overnight batch risk calculations and predefined queries, can no longer suffice, and a more real time approach to risk calculation, with on-the-fly queries, is needed.

From a solution architecture point of view, Risk Analysis is a compute-intensive and a data-intensive process. Looking at our customers’ systems, we see ever-increasing volumes (number of calculated positions and assets, number of re-calculations, data affinity, etc.) and on the other hand we see an ever-increasing demand to reduce the response time, to conform with the regulations or for competitive edge. That makes it a classic Big Data analytics problem.

From a technology point of view, risk analysis solutions traditionally relied on designated compute grid products for the calculations and on relational databases as the data store. That was fine for overnight batch processing, but with the introduction of the new real-time demands databases tend to become bottlenecks under the load, due to the disk and network resources.

Risk Analysis solution architecture revisited

Our experience with such solutions shows that the effective architecture to meet these challenges is a Big Data multi-tiered architecture, in which intraday data is cached in-memory for low-latency response, while historical data is kept in a database for more extensive data mining and reporting. Simple caching solutions cannot provide the scalability of the intraday data under such write-intensive flows (streaming market data, calculation results, and such), and it is therefore an In-Memory Data Grid that has become the standard technology in modern solutions for storing intraday data. Intelligent data grids such as GigaSpaces XAP also provide on-the-fly SQL querying capabilities, which overcome the limitation of predefined queries in traditional architectures.  As for historical data, we see a clear shift from relational databases to NoSQL databases, which perform much better for mining these volumes of semi-structured data.

A piece of the architecture that is often overlooked on initial architecture discussions is the system orchestration. Surprisingly, many of the customers I visit tend to think of risk analysis solutions as the mere sum of a Compute Grid product (for computation scalability) and a Data Grid product (for data scalability). But they neglect to consider the orchestration logic to handle the intersection between the data grid and the compute grid, taking care to avoid duplicate calculations, handling cancellation of calculations, monitoring the state of ongoing calculations, feeding ticks and updates to the client UI, end more. All this amounts to a significant orchestration layer that is traditionally developed in-house.

A much more effective architecture is to embed the orchestration logic together with the data grid within one platform, thereby abstracting the complexities from the clients and removing the need of the clients to interact with anything but the unified platform. GigaSpaces XAP offers the co-location of processing and messaging together with the data, which makes implementing such architectures quite easy. This also enables pre-/post-processing on the data, such as data formatting prior to processing, and result aggregation after calculations, which are requirements often seen in such solutions.

Event-Driven Architecture is highly useful for streaming calculation results to the awaiting clients as they arrive and streaming ticks and other updates to the UI. Using GigaSpaces XAP the implementation of such architecture is made simple by leveraging on the Asynchronous API and on the messaging layer which can treat each data mutation as an event.

To address the real time analytics challenge on the end-to-end Big Data architecture, across both the intraday data (which resides in-memory within the data grid) and the historical data (which resides within a relational/NoSQL database), requires a holistic view of the multi-tier architecture. Intraday data is changed at an extremely high rate with frequent event feeds, whereas historical data can be written in a more relaxed manner, using a write-behind (write-back) caching architectural approach, and consolidating queries across the data stores, making it seem as one unified source for query purposes. Such consolidation is traditionally achieved by combining the various products, but GigaSpaces offers a Real-Time Analytics solution, enabling you to focus on your business logic and leave the rest to the platform.

Future directions

There’s more to discuss in such architectures, such as multi-site deployments over WAN, support for cloud bursting, and more, which should be considered when approaching such solutions. I will not get into these concerns on this post, but you can see coverage of future directions on my webinar.

To get more information on the domain and its challenges, and to hear more on the suggested architecture for Big Data risk analysis solutions I’d recommend watching the full webinar.


Posted in architecture, Big Data, Financial Services, GigaSpaces, Market Analytics, Real Time Analytics, Risk Management, Risk Managment, Scalability, syndicated | Tagged , | Leave a comment

Moving away from Mainframe to Commodity – How?

Moving away from Mainframe to Commodity – How?

Mainframe (Z/OS) based systems running COBOL programs are legacy systems in many organizations. These are planned to be replaced with low cost commodity servers running Java or .Net based systems, saving the cost of the expensive mainframe MIPS and COBOL-based development.

Using GigaSpaces XAP can simplify the migration effort from mainframe based systems and reduce the cost of the legacy applications. In addition, having GigaSpaces XAP act as a front-end layer for mainframe based systems may boost the system performance and improve the overall system response time on peak load.

GigaSpaces' ability to deploy, manage and scale services along with the data (that can be partitioned and replicated across multiple commodity machines) will enable your mainframe applications to access GigaSpaces XAP's In-Memory Data Grid (IMDG) with minimal re-factoring of existing application code without having to introduce additional third party products, dramatically reducing implementation times and minimizing incremental costs in software licenses and hardware.

GigaSpaces Intelligent Mainframe Front-end Architecture

GigaSpaces XAP provides an extremely flexible persistence layer (known as the mirror service) that enables transparent communication between the GigaSpaces IMDG and virtually any type of back-end application or database system.

When used with a database, the Mirror service is one of the primary reasons allowing GigaSpaces XAP to overcome database locking issues experienced on peak load periods. The Mirror service offloads the database access, since the IMDG operates as the primary interface to the application data while handling persistence as a back-end durable ordered activity, delegating in-memory transactions to the database running on the mainframe.

mainframeIntegration

Any access to the data done primarily from the IMDG using one of the standard interfaces GigaSpaces XAP supports (POJO/Spring, JPA, JDBC, key/value, or Document APIs). If the desired data item cannot be found within the IMDG, it will be accessed through the database running on the mainframe, retrieving the relevant data item, loading it into the IMDG to be reused for subsequent transactions and passing it back to the client application. This approach saves the need for accessing the mainframe on every application data access by using an in-memory layer that may scale on demand.

Controlled, Reliable, and Optimized Mainframe Access

XAP's Mirror service has a central coordinator for all back-end store updates, enabling you to batch data and persist in-memory transactions via a continuous background access to the mainframe where the frequency of access is pre-configured. This allows the system to minimize the number of mainframe connections and interactions reducing MIPS consumption while controlling the data consistency level and synchronization between the in-memory representation of the data and the copy on the mainframe.

Many mainframe-based applications that perform nightly batch jobs drive a large number of data updates being made to back-end stores. In this context, GigaSpaces' inherent ability to maintain transactional integrity is critical. In-Memory transactions can be fully committed preserved in multiple different physical locations using GigaSpaces' high-availability mechanism, and ultimately persisted to the database with zero risk of the mainframe and GigaSpaces being out of sync for a long duration.

For more details see the Mainframe Integration Best Practice.

 
Shay Hassidim

Facebook Twitter Linkedin Reddit Buzz Email
Posted in Application Architecture, Caching, Cloud, Data Grid, Development, GigaSpaces, sba, SOA, space-based architecture, Spring Framework | Leave a comment

2012 Cloud, PaaS, NoSQL Predictions

1372170_green_and_blue cop2y
2011 is coming to its end and now is a good time to start planning for 2012. I thought that a good start would be too look at my 2011 predictions and if my previous (and first) attempt to predict someting in that turbulent environment held any water...so, here is a quick recap of 2011.

Recap of 2011

Private vs. Public Cloud - As I noted in my recent post Public vs. Private Clouds I felt that during 2011 the debate around public vs. public cloud would become less interesting, as most of the industry has started to accept the fact that there is a need for both environments, and the important issue would become how to make them work well together. The most interesting development in that regard was Rackspace’s recent announcement about their plan to support OpenStack based private clouds, which shows that even public cloud providers have fully embraced this idea.

OpenStack is evolving from a movement into a viable reality - the momentum around OpenStack has gone through ups and downs throughout the year as happens with every new technology. However looking back, it appears that 2011 was a fairly successful year for OpenStack with its first public cloud available already in the market.  Dell and HP have started to offer the OpenStack based cloud to their customers, as has Citrix.  Rackspace announced their plan to provide official support including for those who want to build their own OpenStack environment…that's quite big considering the short timeframe from when the technology was first introduced…still there is a long way to go but the future looks promising - check out this survey in that regard.

PaaS adoption has been happening at a slower pace than expected, despite the fact that the trend remains consistent.  For PaaS startups 2011 was a fairly significant year with the acquisition of Heroku by Salesforce.  Amazon Redhat and VMware joined the PaaS arena; Amazon with Elastic Beanstalk, Redhat with their OpenShift initiative, VMware with CloudFoundry, adding to its previous acquisition of SpringSource vFabric. This was a fairly significant year for us at GigaSpaces as we launched a new product in this same domain that aims to completely change the way PaaS is being taught today (stay tuned…).

Google App Engine have no doubt been the disappointment of the year by literally killing GAE as we knew it (amongst many other things) with their new pricing model

Big Data has gone real time.  Facebook made a big announcement on how they moved their batch-oriented analytics system to real time analytics (See my previous posts on this subject here and here).  Twitter announced a launch of a new Real Time Analytics dashboard; while both join Google and Yahoo who have already started to make this shift.  Google has also been transforming their web analytics framework into real time. As I noted in my 2011 predictions, the entire debate around NoSQL and SQL didn't make sense, and indeed we’ve seen quite a few announcements both from Cassandra and Couchbase on their support for SQL-like query support.

In Memory Data Grids have also taken a similar approach where, with GigaSpaces, we’ve launched our JPA support, other Data Grid implementations such as Infinispan and Gemfire seems to be heading in that same direction each adding different levels of SQL support. The interesting development in this regard is that we were able to prove that you could actually mix and match Document/Schemaless APIs with SQL APIs and have the flexibility to choose the right language for the job (See online demo Same Data Any API).

All in all I think that I came fairly close - don't you think…?

Ok that gives me enough confidence to try the same thing for 2012. 


2012 predictions

Cloud

iCloud everywhere - IMO the biggest shift in Cloud is the fact that it’s going to become pretty much invisible to many of the end users as new mobile devices, operating systems and applications start to be designed with cloud support in mind. Apple iCloud and DropBox mark the beginning of this trend. Using cloud for collaboration and synchronization is definitely a killer app for many of the consumer based apps. I expect that in 2012 we’re going to continue to see a big push of many SaaS-based offerings in that space toward rich client support that uses the cloud as a backend and leverages the power of the new generation of advanced mobile devices. The difference is that those clients won’t be just another frontend for the same web UI, but something that will run almost entirely on the mobile device and will use more generic cloud services for synchronization and collaboration. This will create the need for more generic cloud services such as database as a service and other middleware services that can interact directly from mobile applications.

Moving from Amazon-centric clouds to Cloud Mashups – In 2011 we started to see new kinds of clouds starting to pop up. Literally every hardware vendor (IBM, Dell, HP,..), telco (ATT, Verizon, KT), and software provider (Oracle, Microsoft) are either developing or already offer something in this space. Each one tries to maintain a unique position to compete with Amazon either through SLAs, locality, security, or being more open through the support of OpenStack. In 2012, this movement is going to become even stronger as many of the players that have been making the investment during 2011 will come out full speed ahead in 2012.

Microsoft finally gets it with Azure - Microsoft has been around for a while with Azure with somewhat marginal success mostly around its .NET user base, an approach that is too narrow a play when it comes to cloud.  Their cloud strategy is coming into focus with the offering of a more ubiquitous cloud supporting technologies that were previously unheard of on a MSFT cloud platform - such as Java, PHP and it wouldn't be too far to assume that they will be supporting Linux applications in the cloud as well. 

Cost-driven Application Management - One of the things that is still fairly hard to measure in the cloud is cost, and more specifically how each component of our application and architecture contribute to cost.  This is specifically true during current market conditions which are going to put even more pressure on cost savings. Cost-driven application design patterns will start to emerge, and will become an integral part of any design for cloud applications just as scalability and performance are today. A new form of Cost Driven Application Management (CDA) will start to emerge to provide better insight on how our application behaves from a cost analysis perspective - Newvem is a new startup in that space that already launched their private beta.

Mission Critical Apps move into the Cloud - As the industry matures there is no reason why we should draw the line for cloud adoption at simple apps. The challenge will be mostly around performance, latency, and ensuring continuous availability. A new class of middleware and application platforms that are designed specifically for cloud environments will become more popular to help in that transition. On the other hand, Java and JEE specifically will finally become more cloud ready as I noted in an earlier post - Java and the Center Stage.

Network Gets into Cloud API Stack - While compute and storage have become virtualized to fit into the cloud, we haven't seen much advancement on the network layer. Many of the networking providers are now launching APIs to enable better control over the cloud network. Alcatel recently announced an interesting cloud proposition in this domain specifically targeting telcos.  The idea is to use the network as a vehicle for making distributed data centers look like one big cloud, making it possible to better leverage existing assets and offer SLA driven compute resources based on latency, location etc. Other cloud providers are also starting to open their network APIs starting from the Load Balancer down to the core switch. This opens up a new set of opportunities for integrating these network APIs with the upper layer of the application stack.

More OpenStack Clouds - 2011 was the just the beginning of that trend, 2012 will see more public and private cloud providers offering support for OpenStack APIs with RackSpace, Dell, and HP already making public announcements in this area. The interesting question in this regard would be how Citrix will play out their CloudStack acquisition with its OpenStack strategy. 

 

PaaS

DevOps and PaaS Converge into App DevOps PaaS - One of the topics that drew a lot of my personal interest last year was the DevOps movement. For odd reasons, most of that movement was driven by Ops and less by Devs.  In 2012, we’ll see many of the DevOps tools such as Chef and Puppet integrated into application platforms making it easier to deploy complex applications onto the cloud. In the same way, we’re going to see more Application Platforms adopting the automation and recipe model from the DevOps world into the application platform. The latter have the potential to transform the opinionated PaaS offerings as we know them today, with Heroku and GAE leading that trend, into a more open PaaS offering that better fits into the way users develop apps today and giving more freedom to choose your own stack, cloud, and application blueprint.

Beyond Google App Engine, Heroku - Heroku established itself as a one of the early PaaS providers in the market and is now expanding their offering to Java. CloudFoundry, DotCloud and others are slightly different but still follow the same path.  In 2012 you should expect more choices for completely different PaaS platforms starting with JEE PaaS offerings from Redhat, IBM, and Oracle, to private PaaS offerings which essentially are frameworks to build your own PaaS, DevOps PaaS offerings (see note above), as well as vertical PaaS for specific industries. Magento announced their plan to provide PaaS for eComm and it wouldn’t be crazy to assume that others will follow that same path. 

BigData

Not only Hadoop Centric - During 2010 and to a lesser degree 2011 Big Data discussions were pretty much centered around Hadoop.  NoSQL solutions such as Cassandra and Mongo are gaining fast adoption mainly due to the operational and development complexity that comes with Hadoop. That movement is going to continue at an even greater pace, as Hadoop gets fragmented between many vendors and frameworks such as EMC, MapR, Cloudera, Yahoo, IBM each claiming to own their own Hadoop distro. With new funding in the hands of many of the NoSQL startups I’d expect to see more complete solution stacks targeted at Big Data.

In Memory Data-Grid and NoSQL Become Integrated - During the early days of the NoSQL movement it wasn't clear how the two technologies fit together. As I noted in my previous post here, it actually makes more sense to integrate the two technologies in a context of real time analytics for Big Data or real time data processing for Big Data.  Indeed during 2011 I started to see more case studies showing the use of the two technologies as with Facebook and Twitter. MemBase is also a good example for that approach with their announcment earlier this year about their integration of Memcached and CouchDB together into a single product. At GigaSpaces we added built-in integration for Cassandra and MongoDB, as noted here, and plan to invest more in that direction during 2012.

A New Class of Big Data Application Platforms will Address the Development and Operational Complexity of Big Data Applications - As Big Data application become more mainstream we start to hit the next level of complexity, development, and operational complexity. Clearly plugging NoSQL into your architecture may address your scalability requirements but at the same time it’s going to make your development and management experience more complex.  Not because the products themselves are complex, but mostly it’s because it is less obvious how to build and design the application around these new technologies. As in previous years, the goal of application platforms is to ease that task by putting together an integrated stack that makes it easier to develop Big Data applications as I noted in my post on Big Data Application Platforms.

References

Posted in Cloud, Cloudify, Data Grid, GigaSpaces, syndicated | Tagged , , , | Leave a comment

Making Cloud Portability a Practical Reality

In one of my previous posts Five Misconceptions On Cloud Portability I argued that:

The term "cloud portability" is often considered a synonym for "Cloud API portability," which implies a series of misconceptions. If we break away from dogma, we can find that what we really looking for in cloud portability is Application portability between clouds which can be a vastly simpler requirement, as we can achieve application portability without settling on a common Cloud API. ..

The following presentation shows how I could use the ideas from this post and provide a practical cloud portability solution today using Cloudify and JClouds.