Today we announced we’ve joined forces with Intel to simplify artificial intelligence (AI) through an integration between GigaSpaces’ InsightEdge platform and Intel’s BigDL open source deep learning library for Apache Spark. The combined solution forms an enhanced insight platform based on Apache Spark; offering a distributed deep learning framework that empowers insight-driven organizations.

Adoption of AI innovations like deep learning is growing rapidly across industries such as financial services, healthcare, transportation, and retail; where GigaSpaces has a strong track record in delivering high performance solutions. In addition, the company has expanded its analytics portfolio over the past year to incorporate full-stack analytics (SQL, streaming, machine learning) through Apache Spark. The BigDL and AI portfolio provide an infrastructure-optimized solution for deep learning workloads leveraging Intel® Xeon® Scalable processors. Together the technologies fill a critical market gap by creating an intelligent insight platform that makes it easy to innovate on real-time advanced analytics applications with low risk and TCO.

Key benefits of the integration include:

    • Cost savings: BigDL eliminates the need for a dense specialized hardware for deep learning, meaning low cost compute infrastructure using Intel Xeon Scalable processors that can train and run large-scale deep learning workloads without relying on GPUs.
    • Simplicity: Deep learning scenarios are complex and require advanced and complex training workflows. InsightEdge’s simplified analytics stack, leveraging BigDL and Apache Spark (open source and widely adopted) eliminates cluster and component sprawl complexity; radically minimizing the amount of moving parts, while capitalizing on existing Spark competency.
    • Scalability: The integration allows organizations to innovate on text mining, image recognition, and advanced predictive analytics workflows from a handful of machines to thousands of nodes in the cloud or on-premises, using the same application assets and deployment lifecycle.

“Harnessing the business value of artificial intelligence is often challenged by the lack of mature compute infrastructure and technology complexity, leading to inefficiency and slower time-to-analytics,” said Ali Hodroj, Vice President of Products and Strategy at GigaSpaces. “Our integration with BigDL helps enterprises deploy, manage, and optimize a simplified and comprehensive AI technology stack for automated intelligence without the need for expensive, specialized hardware or complex big data solutions.”

The solution will be demonstrated at Intel’s booth at the Strata Data Conference, September 26 – 28, in New York, NY and at the Intel booth at Microsoft Ignite in Orlando, Florida September 25-29 2017. The demo presented will feature:

  • AI-driven customer experience analytics through natural language processing
  • Unified NLP, deep learning, and search in one simplified Spark distribution

During the demo, in order to illustrate an enhanced customer experience, customers will speak in their own words to a company’s interactive voice response (IVR). The IVR will be able to quickly understand what the customer needs and solve their problem through ML.

SCALABLE DEEP LEARNING INNOVATION WITH INTEL'S BIGDL

“BigDL’s efficient large-scale distributed deep learning framework, built on Apache Spark*, expands the accessibility of deep learning to a broader range of big data users and data scientists,” said Michael Greene, Vice President, Software and Services Group, General Manager, System Technologies and Optimization, Intel Corporation. “The integration with GigaSpaces’ in-memory insight platform, InsightEdge, unifies fast-data analytics, artificial intelligence, and real-time applications in one simplified, affordable, and efficient analytics stack.”

In the demo below we will show you how to combine real-time speech recognition with real-time speech classification based on Intel’s BigDL library and InsightEdge.

What is BigDL?

BigDL is a distributed deep learning library for Apache Spark. You can learn more about deep learning and neural networks on Coursera.

With BigDL it’s possible to write deep learning applications as standard Spark programs thus allowing to leverage Spark during model training, prediction, and tuning. High performance and throughput is achieved with Intel Math Kernel Library. Read more about BigDL here.

Motivation

As for example let’s consider big companies with huge client base requires to organize call centers. In order to service client correctly, it’s vital to which specialist he should be directed. The current demo takes advantage of cutting edge technologies to resolve such tasks in an effective manner less than in 100ms. Here is a general workflow:

Application flow

Architecture

Let’s take a helicopter view of the application components.

How to run it

Used software:

  • scala v2.10.4
  • java 1.8.x
  • kafka v0.8.2.2
  • insightedge v1.0.0
  • BigDL v0.2.0
  • sbt
  • maven v3.x

Prerequisites:

  • Download and extract data(first three steps) as described here
  • Set INSIGHTEDGE_HOME and KAFKA_HOME env variables
  • Make sure you have Scala installed: scala -version
  • Change variables according to your needs in runModelTrainingJob.sh, runTextPredictionJob.sh, runKafkaProducer.sh

Running demo is divided into three parts:

  1. Build project and start components
    • Clone this repo
    • Go to insightedge directory: cd BigDL/insightedge
    • Build the project: sh build.sh
    • Start zookeeper and kafka server: sh kafka-start.sh
    • Create Kafka topic: sh kafka-create-topic.sh. To verify that topic was created run sh kafka-topics.sh
    • Start Insightedge in demo mode: sh ie-demo.sh
    • Deploy processor-0.2.0-jar-with-dependencies.jar in GS UI.
  2. Train BigDL model
    • Train text classifier model: sh runModelTrainingJob.sh
  3. Run Spark streaming job with trained BigDL classification model
    • In separate terminal tab start Spark streaming for predictions: sh runTextClassificationJob.sh.
    • Start web server: cd BigDL/web and sh runWeb.sh.

Now go to https://localhost:9443:

  1. Click on a microphone button and start talking. Click microphone button one more time to stop recording and send speech to Kafka.
  2. Shortly you will see a new record in “In-process calls” table. It means that call is currently processed.
  3. After a while row from “In-process call” table will be moved to the “Call sessions” table. In column “Category” you can see to which category speech was classified by BigDL model. In column “Time” you will see how much time in milliseconds it took to classify the speech.

Shutting down

  • Stop kafka: sh kafka-stop.sh
  • Stop Insightedge: sh ie-shutdown.sh

We’ll be demoing our InsightEdge Platform so you can see in real-time what real-time analytics can do at Strata Data Conference, September 26-28, in NYC. To see the demo be sure to stop by our booth #809 or bythe Intel booth #121. If you want to set a meeting with us, you can do it here.

GigaSpaces Integrates InsightEdge Platform with Intel’s BigDL for Scalable Deep Learning innovation
Rajiv Shah
Senior Solutions Architect @ GigaSpaces
Rajiv has 10+ years experience with Distributed computing and In-Memory technologies. At GigaSpaces he works with R&D and Product Management departments to convey customer needs. RAjiv has vast knowledge in Java, Scala, Spark, big-data and .NET ecosystems and experience in building both high-level and detailed architectures for complex systems.
Tagged on: