The digital universe is estimated to see a 50-fold data increase in the 2010-2020 decade. Gartner expects 6.4 billion connected things will be in use worldwide in 2018, up 30% from 2015, and will reach 20.8 billion by 2020.

According to IHS forecasts, the Internet of Things (IoT) market will grow from an installed base of 15.4 billion devices in 2015 to 30.7 billion devices in 2020 and 75.4 billion in 2025. McKinsey’s Chris Lp estimates the total IoT market size in 2015 was up to $900M, growing to $3.7B in 2020 attaining a 32.6% CAGR.

In every respect, big data is bigger than you can imagine, but moreover – it’s accelerating.

When it comes to the IoT, this involves an increasing number of complex projects encompassing hundreds of suppliers, devices, and technologies. Michele Pelino and Frank E. Gillett from Forrester predict fleet management in transportation, security and surveillance applications in government, inventory and warehouse management applications in retail and industrial asset management in primary manufacturing will be the hottest areas for IoT growth.

The impact of increasing the amount of data is the increase in velocity in which we have to ingest that data, perform data analysis and filter the relevant information. With a stream millions of events per second coming in from IoT devices, organizations must equip themselves with flexible, comprehensive and cost-effective solutions for their IoT needs.

At GigaSpaces, we’ve come to a realization that the solution to this growing need is not radically changing an existing architecture, but rather extending it through in-memory computing to enable fast analytics and control against fast data. The combination of low latency streaming analytics, along with transactional workflow triggers, enables acting on IoT data in the moment. This includes predictive maintenance and anomaly detection again millions of sensor data points.

InsightEdge Fuels Magic’s Predictive Engines for All IoT Needs

The Challenge

Magic Software Enterprises, a global provider of enterprise-grade application development and business process integration software solutions and a vendor of a broad range of software and IT services, has been leveraging GigaSpaces XAP for years.

With Magic’s xpi Integration Platform, Magic was looking to expand their IoT offering for more complex scenarios, specifically for a data aggregation solution to form an IoT Hub in front of Magic xpi. The solution needed to be flexible enough to meet a variety of applications regardless the data and velocity requirements.

In the age of fast data, the xpi platform, although proving operational interoperability, it still faces the challenge of many existing platforms that are not ready to handle fast data ingestion scenarios. Magic was looking for a POC which could be implemented as quickly as possible while delivering fast results.

The Solution

InsightEdge was the perfect choice to help the Magic IoT solutions handle all the difficult data transformation challenges, allowing customers to concentrate on designing the best processes and flows to support their business goals. The solution needed to be to be flexible and open to any type of data input, regardless the type and structure of the data, the velocity, running in-memory. That’s where we came in. During our meeting, we suggested a simple solution based on Kafka and InsightEdge to help facilitate data velocity and variety in IoT use cases.

By integrating InsightEdge In-Memory streaming technology, incoming sensor data is analyzed through a multitude of predefined filters and rules and aggregated by InsightEdge. The aggregated  data is easily compared, correlated and merged and is transferred in batches to Magic xpi, where a prediction engine is first to predict when IoT equipment failure might occur, and to prevent occurrence of the failure by performing maintenance. Monitoring for future failure allows maintenance to be planned before the failure occurs. InsightEdge Solution

Benefits

InsightEdge provides Magic with a few key benefits:

  1. Performance: Ability to ingest fast data from multiple IoT sensors.
  2. Data Aggregation: InsightEdge is able to handle streaming sensor data at high throughput and aggregate it in time windows that are relevant to each sensor’s notification rhythm.
  3. Fast Data Storage: The streamed data then becomes structured into a semantically-rich data model that can be queried from any application.
  4. Simplification of Big Data Architecture: InsightEdge easily enables Magic to combine the power of Apache Spark and Fast Data analytics without the need for large-scale data source integration or data replication (ETL).

Results

Using InsightEdge, Magic is able to provide its customers with fast data streaming and the ability to perform aggregations and calculation capabilities on the in-memory grid. Using the XAP data grid makes the streaming process it that much faster, hence eliminating the need for Hadoop.  

InsightEdge facilities Magic’s customer needs for the IoT deployments with predictive manufacturing and maintenance, enabling them to receive real-time, fast, data-driven events from their systems.

InsightEdge Use Case: Car Telemetry Ingestion and Data Prediction using Magic’s xpi

A live InsightEdge use case is Car Telemetry Ingestion and Data Prediction using Magic’s xpi. In the case of car telemetry, it is very hard to predict in advance what data will be useful. In the case of data prediction, we need to think about not only device telemetry but also diagnostic telemetry.

Predictive car maintenance requires car telemetry ingestion and data prediction. Magic’s solution stack needed one more component in the architecture to be fully compliant with fast data and scalable scenarios, assured innovation was needed and the correct puzzle piece to fit.

In this use case, we will cover post-data-collection (assuming we have CSV files but could have been streaming all the same) and up until the data sent to Magic’s xpi Integration Platform.

How we built it

Kafka

Apache Kafka is a distributed streaming platform, or a reliable message broker on steroids but not limited to just that. It enables building real-time streaming data pipelines that reliably get data between systems or applications and building real-time streaming applications that transform or react to the streams of data.
Transactional and Predictive Analytics

We’ll be using version “kafka_2.10-0.9.0.0” to run our tests, however, newer Kafka versions are out there. You can download Kafka here or download the specific version we’ve used for this use case.

Kafdrop

Kafdrop is a simple UI monitoring tool for message brokers. In this case, we will use it for Kafka to moderate the topics and messages content during development. Download Kafdrop using the instructions on the Git page and install following the instructions.

InsightEdge

InsightEdge is a high-performance Spark distribution designed for low latency workloads and extreme analytics processing in one unified solution. With a robust analytics capacity and virtually no latency, InsightEdge provides immediate results.
GigaSpaces’ Spark distribution eliminates dependency on Hadoop Distributed File System (HDFS) so as to break through the embedded performance “glass ceiling” of the “stranded” Spark offering. To this, GigaSpaces has added enterprise-grade features, such as high-availability and security. The result is a hardened Spark distribution that is thirty times faster than standard Spark.

Download InsightEdge here. No installation needed, simply unzip the file to the desirable location.

Code Deep-Dive

Admin work

First we need to start Kafka and InsightEdge, so we’ll use the following two scripts:

  1. start-kafka.sh

Transactional and Predictive Analytics

  1. start-insightedge.sh

Transactional and Predictive Analytics

  1. Next, we’ll start Kafdrop so we can have a UI on our Kafka broker. To run Kafdrop, browse to the target directory and run:

For example, if I run ZOOKEEPER locally and on port 2081, use the following:

Browse to the local instance to make sure it works [link: http://localhost:9000]

Transactional and Predictive Analytics

  1. Last but not least, we need to start an HTTP server stub (to later be replaced with some other integration endpoint), call the start-http-server.sh:

Model Class

Now we’ll have to build our model, so let’s see how it should look like:

Next, we write our event class (to handle incoming events:

Code

Now that we have our Model and Event-Model we can write the code we want to deploy to Spark (which will actually read from Kafka to Spark and persist to the grid):

Transactional and Predictive Analytics

Now, we have three options of running logic:

  1. Spark job
  2. Grid job (event-driven)
  3. External job against the Grid (which temporarily “holds” the data for Spark)

We chose to go with the third option as we have scalability and growth considerations. We need to take into account dozen of external processes running rather than one very long event on the grid. It’s a simple push/pull decision and we’ve decided to pull (If you wish to implement a Processing Unit (PU), see appendix 1).

Transactional and Predictive Analytics

Executing an external job against the Grid is an interesting choice altogether because it is something that can be initiated from any integration end-point, starting with simple jar file and implementing it in any custom made code or integration platform:

To run the above code, we’ll use a simple script that will call the generated jar file:

As a bonus, here’s a code to remove all the car events from the space (the Grid that is):

So you might be asking yourselves what is the CSVProducer? Well, here it is:

Transactional and Predictive Analytics

Run it by using the following script:

Appendix 1

This is a general TEMPLATE of how to implement a PU.

Magic’s IoT Platform

Magic xpi is an integration platform that connects IT systems, enabling you to orchestrate data flows that support your business goals. It supports a wide range of business ecosystems, implementing out-of-the-box certified and optimized connectors and adaptors to extend the capabilities of leading ERP, CRM, finance, and other enterprise systems.

Magic xpi acts as the orchestration engine between all relevant parts: Knowledge Bases, Machine Learning, Asset Management and Service Cases. Magic xpi orchestrates and connects data based on services using HTTP-trigger (XML) and ODATA provider. Magic xpi interacts with dedicated ecosystems using Magic xpi connectors.

Magic xpi Diagram with InsightEdge

Final Thoughts

It is estimated that by 2017, 60% of global manufacturers will use analytics to sense and analyze data from connected products and manufacturing and optimize increasingly complex portfolios of products. By 2018, the proliferation of advanced, purpose-built, analytic applications aligned with the IoT will result in 15% productivity improvements for manufacturers regarding innovation delivery and supply chain performance.

The flexibility of combining transactional and analytics functionality provided by InsightEdge and XAP is what separates GigaSpaces from the rest. With Magic’s use case, we are enabling IoT applications at scale through open source components at the center, edge, and cloud.

Gigaspaces newest data product, InsightEdge offers an Apache Spark-empowered analytics platform to help facilitate full-spectrum analytics (Streaming, machine learning, graph processing) in IoT use cases. We are happy to integrate into Magic’s solution stack which required full compliancy with fast data and scalable scenarios.

For more information, RSVP to our joint webinar with Magic on July 26th.

Converge Transactional and Predictive Analytics to Effectively Scale IoT
Tal Doron on LinkedinTal Doron on Twitter
Tal Doron
Director, Solution Architecture EMEA & APAC @ GigaSpaces Technologies
Pre Sales & Business Development (EMEA/APAC) engaging with all levels of decision makers from architects to strategic dialogue with C-level executives. Working at GigaSpaces specializing in mission critical applications with focus on Enterprise Big Data Solutions, RT Analytics, In-Memory Computing, Distributed Processing, High Scalability, Lean/Agile Engineering, and Cloud Computing. Prior to joining GigaSpaces, he served in technical roles at companies including Dooblo, Enix, Experis BI and Oracle.
Tagged on: