Insights into In-Memory Computing and Real-time Analytics
Crossing the Chasm from Insights to Action
Today's modern enterprise is characterized by being digital and insight-driven.
When it comes to implementing real-time analytics, we often see a disconnect between insights and action. This disconnect makes it challenging to optimize business processes in real-time and improve customer experience to open up new revenue streams.
The chasm between Analytics and Actions:
1. Complex Architectures
Implementing the lambda architecture is no simple task. It's mired with the complexity of stitching together three different distributed systems (Hadoop, Streaming and MPP databases), which means more software, hardware, and high TCO. Not to mention the challenge in maintainability and debugging.
2. Slow Performance:
Most big data frameworks have been designed for high-throughput rather than low-latency. For uses cases in Finance, Telco and IoT low-latency are a critical requirement for fast data analytics. Most big data frameworks have been designed for high-throughput rather than low-latency. For uses cases in Finance, Telco and IoT low-latency are a critical requirement for fast data analytics.
3. Slow Feedback Loop:
Traditionally, systems that run the business (OLTP) have been separated from systems that manage it and gain insight (OLAP and Hadoop ecosystems). In addition, there's a development velocity impedance mismatch between fast innovations in the application space (microservices, continuous integration...etc) and slow moving data preparation and analytics in the data warehousing and the big data world. This leads to a stale data architecture where analytics become an after-thought rather than an immediate optimizer to transactional applications.
InsightEdge Solution: Apache Spark + In-Memory Data Grid
Our approach to covering these gaps is the combination of a distributed scale-out in-memory data fabric (and in-memory data grid) with a unified big data framework such as Apache Spark. This combination, InsightEdge, is what we believe will be one of the enablers of hybrid transactional/analytics processing that connects systems of record to systems of engagement in real-time.
The key premise behind InsightEdge is simplification and performance. We achieve this by bringing the agility, performance and enterprise readiness of in-memory data grids to the exciting new world of Apache Spark. An in-memory data grid is an elastic scale-out in-memory storage for low-latency and high throughput. While data grids are primarily focused on RAM and primary storage, it also supports a multi-tiered data storage model where SSD, Flash, and Storage-Class Memory devices can be used for footprint expansion.
On top of this data-grid, there's a Spark distribution, which is optimized to enable the co-location of microservices and transactional applications with Spark-based workloads (RDDs, DataFrames, Machine Learning...etc). In addition, we provide mission critical capabilities such as high availability, security and resource management. Upstream data ingestion is provided through standard Spark jobs (RDD/DataFrame transformations, Streaming jobs...etc), which downstream data persistence and pre-loads is possible through the data grid's pluggable data source API.
Where can you demo InsightEdge?
There's an easy way for you to demo InsightEdge. Simply standing up a cluster on one machine and looking into the Apache Zeppelin notebook.
One example is the InsightEdge Basics Demo, which gives you an overview of the data modeling, ways of writing Spark workloads as well as being able to run analytics queries against your existing data sources.
Ran into any trouble? Visit our forum for help.