Building a dynamic high-speed data-hub with multi-semantic querying capabilities

Building a dynamic high-speed data-hub with multi-semantic querying capabilities

In Memory Computing, Hadoop and the “Lambda Architecture”

In Memory Computing, Hadoop and the “Lambda Architecture”

Facebook’s vs Twitter’s Approach to Real-Time Analytics

Facebook’s vs Twitter’s Approach to Real-Time Analytics

Industry Survey Shows: Organizations Turn to Stream Processing for their Big Data Needs

Industry Survey Shows: Organizations Turn to Stream Processing for their Big Data Needs



GigaSpaces is excited to announce a joint partnership with SanDisk today!

GigaSpaces is excited to announce a joint partnership with SanDisk today!

Enjoying both worlds with one Architecture!

Big-Data-Real-Time-Performance – Enjoying both worlds with one Architecture!

  • DevOps Meets PaaS from NY Meetup with Chef

      DevOps Meets PaaS – NY Meetup with Chef (OpsCode) If you missed out on the live version, catch Uri Cohen's presentation with OpsCode at the DevOps PaaS Infusion Meetup NY on June 19th. Session abstract:  The concept of DevOps and recipes can go well beyond setup, to actually accelerate the entire lifecycle of your applications, from setup, to monitoring, through maintaining high availability, and auto-scaling when required. Cloudify ties things together from an application perspective and prepares everything so
    Read More

  • Live Streaming: Upcoming Meetup – Big Data on OpenStack

    Hi All,   For those who can't make it out to our meetup on Thursday in San Francisco – Big Data on OpenStack, we will be streaming it live here in this post (barring any technical difficulties), so stay tuned.   Your browser does not support iframes.

  • Join us for a stellar line-up of events!

    DeployCon 2012   June 13th, 2012 | 4:15 PM ET Jacob K. Javits Convention Center, 655 West 34th Street, New York, NY (Map) Ask us about free passes! We still have some available.   Join Guy Korland, our VP R&D, at DeployCon 2012, where he will be participating in the Future of PaaS Panel among additional industry experts from the PaaS and DevOps camps. During this session this panel of cloud thought leaders will debate these two approaches for managing applications on the cloud.   To learn more click here.  
    Read More

  • Lessons from Amazon RDS on Bringing Existing Apps to the Cloud

    Its a common believe that Cloud is good for green field apps. There are many reasons for this, in particular the fact that the cloud forces a different kind of thinking on how to run apps. Native cloud apps were…

  • WEBINAR: Big Data in the Cloud

      Big Data TV | Episode I: Big Data in the Cloud Join us for a Webinar today! No one can deny that big data and cloud simply complement each other. Big data requires clusters of servers in order to process and analyze data, which is something the cloud can easily provide.  In this session, we'll learn how you can build your big data "database on-demand" using Cassandra's big data solution, as well as manage your big data application using Cloudify.  
    Read More

  • Cloud Deployment: The True Story

    Everyone wants to be in the cloud. Organizations have internalized the notion and have plans in place to migrate their applications to the cloud in the immediate future. According to Cisco’s recent global cloud survey: Presently, only 5 percent of … Continue reading

  • OpenStack is Coming to Israel

    I’m very excited to announce our first OpenStack Israel event on Wednesday the 30th of May in Petach Tikva in collaboration with the IGT Cloud and Rackspace. Since Avner Algom and myself started to work on the event a few…

  • Bare-Metal PaaS

    The Rise of Bare-Metal Clouds Cloud and Virtualization are not mandatory, and the number of cloud providers that supports bare-metal clouds is growing, as David Linthicum pointed out in his article Going native: The move to bare-metal cloud services It…

  • How to Take Any App to the Cloud meetup on May 17th  GigaSpaces is hosting a cloud computing meetup on how to take any app to the cloud with an all-star panel of CTOs and technical evangelists. Join us for this free, high-energy event on May 17th to hear real world use cases from Microsoft, Aditi, Cisco, GigaSpaces, C24, a Fortune 100 Financial Institution, and learn how cloud computing is opening a whole new world of possibilities for enterprises and ISVs.  Seating is limited, register now!  When:  Thursday, May 17th, 9:30
    Read More

  • XAP 9.0 – Geared for Real-Time BigData Stream Processing

    XAP 9.0 is out the doorway, and I thought it would be a good opportunity to share some of the things we’ve been up to lately. 
    Traditionally, one of XAP’s primary use cases was large scale event processing, more recently referred to as big data stream processing or real time big data analytics. Some of our users are reliably and transactionally processing up to 200K events per second, in clusters as large as a few hundreds of nodes.
    In a sense it’s taking Map/Reduce concepts and applying them to an online stream of events, analyzing events as they arrive rather than waiting for all the data to be available offline and only then triggering the Map/Reduce jobs.
    There are many use cases that are applicable here, such as web analytics, financial trading, online index calculation, fraud detection, homeland security, guidance systems and essentially any use case that requires immediate feedback on a massive stream of events, typically tens or even hundreds of thousands of events per second.
    In the last few months we’ve heard of numerous frameworks that claim to be a real time analytics silver bullet (some rightfully so, some less…), so I wanted to recap what we’ve learned here at GigaSpaces in the past few years from dealing with large scale event processing systems, and what we’ve done in XAP 9.0 to support these scenarios even better.

    What It Takes to Implement Massive Scale Event Processing

    There are a number of key attributes that are important for big data event processing systems to support, all supported by XAP and are at the core of it:
    • Partitioning and distribution: Perhaps the single most important trait when it comes scaling your system. If you can efficiently and evenly distribute the load of events across many event processing nodes (and even increase the number of nodes as your system grows), you will ensure consistent response times and avoid accumulating a backlog of unprocessed events.
      XAP allows for seamless content based routing, based on properties of your events. It also supports adding more nodes on the fly if needed.
    • In memory data: in many cases, you need to access and even update some reference data to process incoming events. For example, if you’re calculating a financial index you may need to know if a certain event represents a change in price of a security which is part of the index, and if so you may want to update some index related information. Hitting the network and the disk for every such event is not practical at the scale we’re talking about, so storing everything in memory (the events themselves, the reference data and even calculation results) makes much more sense. An in memory data grid allows you to achieve that by implementing memory based indexing and querying. It helps you to seamlessly distribute data across any number of nodes, and takes care of high availability for you. The more powerful your in-memory indexing and querying capabilities are, the faster you can perform sophisticated calculations on the fly, without ever hitting a backend store or accessing the network. XAP’s data grid provides high availability, sophisticated indexing and querying, and a multitude of APIs and data models to choose from.
    • Data and processing co-location: Once you’ve stored data in memory across multiple nodes, it’s important to achieve locality of data and processing. If you process an incoming event on one node, and need to read and update data on other nodes, your processing latency and scalability will be very limited. Achieving locality requires a solid routing mechanism, that will allow you to send your events to the node most relevant to them, i.e. the one that contains the data that is needed for their processing. With XAP’s event containers, you can easily deploy and trigger event processors that are collocated with the data and make sure that you never have to cross the boundaries of a single partition when processing events.
    • Fault tolerance: Event processing is in many cases a multi-step process. For example, even the simplest use case of counting words on twitter entails at least 3 steps (tokenizing, filtering and aggregating). When one of your nodes fail, and it will at some point, you want your application to continue from the point in which it stopped, and not go through all the processing flow for each “in-flight” event (or even worse, completely lose track of some events). Replaying the entire event flow again can cause numerous problems: If your event processing code is not idempotent your processors will fail. And under high loads this can create a backlog of events to process, which will make your system less real time and less resilient to throughput spikes. XAP’s data grid in general, and event containers in particular, are fully transactional (you can even use Spring’s declarative transaction API for that if you’d like). Each partition is deployed with one synchronous backup by default, and transactions are only reported committed once the all updates reach the backup. When a primary fails, the backup takes over in a matter of seconds and continues the processing from the last committed point. In addition, XAP has a “self healing” mechanism, which can automatically redeploy a failed instance on existing or even new servers.
    • Integration with batch oriented big data backends: Real time event processing is just part of the picture.  There are some insights and patterns you can only discover thorough intensive batch processing and long running Map/Reduce jobs. It’s important to be able to easily push data to a big data backend such as Hadoop HDFS or a NoSql database, which unlike relational databases, can deal massive amounts of write operations. It’s also important to be able to extract data from it when needed. For example, in an ATM fraud detection system you want to push all transaction records to the backend, but also extract calculated “patterns” for each user, so you can compare his or her transactions to the generated pattern and detect frauds in real time. You can use numerous adaptersto save data from XAP to NOSQL data stores. XAP’s open persistency interface allows for easy integration with most of these systems.
    • Manageability: & Cloud Readiness: Big data apps can become quite complex. You have the real time tear, the Map/Reduce / NoSql tear, a web based front end and maybe other components as well. Managing this consistently, and more so on the cloud which makes for a great foundation for big data apps, can easily become a nightmare. You need a way to manage all those nodes, scale when needed and recover from failure when they happen. Starting from XAP 9.0, XAP users can leverage all the benefits of Cloudify, to deploy and manage their big data apps in their own data center or on the cloud, with benefits like automatic machine provisioning for any cloud, consistent cluster and application-aware monitoring and automatic scaling for the entire application stack, and not just the XAP components.

    XAP 9.0 for Big Data Event Processing

    Now that I covered what XAP already had to offer for big data analytics, I’d like to delve a bit into the new capabilities in XAP 9.0, our newest release, which complement nicely the already existing ones as far as big data stream processing is concerned:
    • FIFO groups (AKA Virtual Queues): this feature is quite unique to XAP. It allows you to group your events based on the value of a certain property, and while across groups you can process any number of events in parallel, within the same group you can only access one event at a time, ordered by time of arrival. Think of a homeland security system with multiple sensors – in many cases you want to process readings from the same sensor in order, so you can tell for example if a certain suspicious car is moving from one place to another, and not vice versa, but across sensors you want to process as many events as possible.
    • Storage types: Most real time analytics systems rely heavily on CPU and memory. So using them efficiently is always important. With XAP 9.0 we’ve introduced a new mechanism that allows users to annotate object properties (object can represent both events and data) with one of 3 modes – native (meaning the property is saved on the heap as a native object), binary (the property is serialized and is only deserialized when an actual client reads it) and compressed (same as binary, with gzip compression).  This allows for fine-grained control over how the memory is utilized and save your application from doing unnecessary serialization and deserialization when accessing the data grid.
    • Transaction-aware durable notifications: Pub/Sub notification are important in scenario where you want a certain event to trigger multiple flows, or be processed in parallel on multiple servers. It is also useful when propagating processing results downstream to other applications. With XAP 9.0 we’ve enhanced our pub sub capabilities to be durable (i.e. even if a client disconnects and reconnects it will not miss an event) and provide once and only once semantics. In addition, Notifications for data grid updates (e.g. event objects written or removed, other data updated) maintain transaction boundaries. That means that if multiple events were written, or multiple pieces of data updated, the subscribed clients will be notified on all of them or none at all.
    • Reliable, transaction-aware local view: another interesting use case when it comes to event processing is when you want you event processor to be located outside of the data grid. This gives you the benefit of scaling your event processors separately from the data grid, at the expense of accessing the data over the network. However using the local view feature allows you to locally cache within the processor’s process, a predefined subset of the data that you to be relevant to your processing logic. The local view mechanism will make sure it stays consistent, up to date and that you never miss a data update even after disconnecting and reconnecting.
    • Web Based Data Grid Console: Understanding what’s going on with your events, what types of events are queued, and what data resides in the data grid is essential to the operation of every event processing system. XAP’s new data grid console allow you to monitor everything within the data grid from your browser using intuitive HTML5 interface. You can view event and data counters, submit SQL queries to the data grid, and do a lot more.
    • Cloud Enablement: XAP 9.0 comes with Cloudify, our new open source PaaS stack built in, which allows you to manage all of the components of your big app, including the backend Hadoop or NoSql database and the web front end.

    See It in Action!

    You can see all of that in action with our new twitter word count example, whose code is available on github.

    What’s Next?

    There are a few other cool features in XAP 9.0 that you can learn about here.

    We’re planning a lot more interesting features around big data analytics, so keep your ears open  :)