Originally developed by Google, Kubernetes is the open source container cluster manager. Who could have believed back in June 2014, when Kubernetes was first released, that it would become the de facto standard for container orchestration, and the foundation of choice for production cloud-native solutions?

Kubernetes is the largest and fastest growing open source software solution focused on democratizing distributed system patterns and flourishes at unprecedented rates in the open source community. To illustrate:

  • There were 388,100 comments on Kubernetes on GitHub during the past year, making it the most-discussed repository by a wide margin, with the number two discussed repository (i.e. Red Hat’s OpenShift Origin) coming in at 91,100 comments. 
  • There were 680 reviews of Kubernetes on GitHub, making it the second-most reviewed project during the past year. 
  • Kubernetes is used by 48% of  companies that have more than 5000 employees.

Inforgraphic_ Kubernetes blog

It’s no surprise, as Kubernetes enables enterprise application development teams with many benefits. These include:

  • Auto-deployment of data services and frameworks (e.g. Apache Spark)
  • Orchestration automation with cloud-native solutions (such as auto scale and self-healing)

Moreover, it offers:

  • A simplified process for building and deploying reliable and scalable distributed applications,
  • Accelerating the number of features that can be shipped per hour while maintaining a highly available service,
  • High scalability through a decoupled architecture,
  • Increased efficiency through collocation,
  • And much more.

Basically, Kubernetes brings a cloud-native platform-as-a-service experience for auto-deployment and orchestration automation. Using Helm, developers can simply go to a marketplace, click on a button, and the most complex application is deployed, automatically, easily, and seamlessly.  Developers get all the scaling, auto-recovery, and self-healing they need without the support of a team of DevOps experts. It is self-contained and offers unprecedented ease-of-use.

Kubernetes Deployment with GigaSpaces InsightEdge

Kubernetes synergizes very well with InsightEdge, our in-memory, real-time analytics platform that brings together enterprise-class advanced analytics, machine learning, and extreme data processing.

InsightEdge, as well as GigaSpaces XAP, support Kubernetes orchestration, utilizing some of its key features, including:

  • Cloud-native orchestration automation with self-healing,
  • Cooperative multi-tenancy,
  • RBAC authorization,
  • And auto-scaling.

Register to GigaSpaces Kubernetes Webinar

InsightEdge and XAP leverage Kubernetes’ anti-affinity rules to ensure that primary and backup instances are always on separate Kubernetes nodes (on separate physical machines). This high-availability design, combined with self-healing, load-balancing and fast-load mechanisms, assures zero downtime and no data loss. Additionally, rolling upgrades can be automated and implemented pod by pod using Kubernetes’ Stateful Sets. This allows for a smooth upgrade process with no downtime.

Automatic scaling is supported, using predefined metrics along with CPU and memory utilization metrics to signal to Kubernetes when to scale up or down. Customized scaling rules according to personalized SLA, based on production needs, effectively balance the resources required to support the application requirements.

All of this is achievable because Stateful Sets manage Pods that are based on an identical container specifications. These Stateful Sets maintain a sticky identity for each of their pods, which are created from the same specification, but are not interchangeable. Namely, each has a persistent identifier that is maintained across any rescheduling.

Through the persistent volume driver, the platforms’ intelligent MemoryXtend multi-tier storage offering lets customers configure data prioritization according to the application’s business logic. This ensures that the most relevant data resides in the fastest data storage tier for optimized TCO.

One Click Is All It Takes

Deploying InsightEdge in a Kubernetes environment provides a seamless, automated, cloud-native experience. With just one click the platform is installed, deployed, and up and running.  All you need to define is the desired Kubernetes cluster and how much data capacity is needed. That’s it.

If this is your first time using Helm charts from GigaSpaces, add our repo:

Next, install the InsightEdge chart:

Let’s take a look at what makes this happen and how you too can make it happen, easily and seamlessly.

Pods

One of the main principles of a Kubernetes deployment is the ‘pods’ approach. A Kubernetes pod represents a running process on the cluster, and is considered the smallest and simplest unit in the Kubernetes object model that can be created or deployed.

There are several pods involved in the Kubernetes deployment of InsightEdge, including:

  • The Management Pod: contains the platform’s management components, namely the Lookup Service (LUS) used by services to discover each other, the REST Manager for remotely managing the environment from any platform, and Apache Zookeeper for Space leader election.
  • The Data Pod: is analogous to the Processing Unit instance in the platform. Each Data Pod contains a single Processing Unit instance that provides cloud-native support using the Kubernetes built-in controllers, such as auto-scaling and self-healing.
  • The Driver Pod: contains the Spark Driver, which creates Spark Executors, connects to them and executes the required application code. When the application completes, the Driver Pod persists the logs and shuts down, remaining in completed state.
  • The Executor Pod: contains Spark Executor, which runs the Spark job on the data in the co-located Data Pod. When the application completes and the Spark jobs are no longer required, the Executor Pod terminates.
  • The Zeppelin Pod: contains the Apache Zeppelin web-based notebook. It enables data-driven, interactive data analytics on the platform.

InsightEdge Architercute on a Kubernetes Cluster

Pods In An InsightEdge Deployment

In a Kubernetes deployment of InsightEdge everything is based on the Kubernetes pods, specifically the Data Pods, the Management Pods, which includes the different elements (e.g. ZooKeeper and REST Manager), the Spark Driver and Executor Pods, and the Zeppelin Pod.

Each pod is standalone and is monitored by Kubernetes, making it easy to recover without dependencies, in case of a failure.

Moreover, through the Kubernetes dashboard, DevOps have access to a holistic view of all the pods that are available, and can easily monitor the different resources, e.g. CPU and memory, and their consumption.

Helm

Helm, the de facto Kubernetes package manager is used for installing InsightEdge in the Kubernetes environment. Among the main advantages of using Helm is that it makes deploying complex applications more portable, it supports automatic rollbacks, and it is a familiar pattern for developers that is easy to understand.

Moreover, since Helm is open source, there are many community charts available with standard configurations for common application services. If you are looking to download and amend open source Kubernetes charts for your own organization, you can do so from the Kubeapps Hub

Helm Charts for InsightEdge

In our installation, a Helm chart is used to describe all the components for deployment, e.g. the manager, the data, Zeppelin, etc.

The Helm chart can be used in a variety of formats and locations; packaged, unpackaged, accessed via a remote URL or a chart repository.

The XAP and InsightEdge Helm charts are published in GigaSpaces Helm charts repository at https://resources.gigaspaces.com/helm-charts. You can install charts directly from this repo, but you may find it easier to instead add the GigaSpaces Helm chart repo to the Helm repo list:

Once you’ve added the GigaSpaces Helm chart repo, Helm can locate the Gigaspaces charts, so you can install them as follows:

Another option is to fetch the GigaSpaces Helm chart and unpack it locally, so you don’t  have to repeat the repo and version in each command (which has the added benefit of making the commands shorter). For example, if you fetch and unpack the Helm chart using the following command:

The chart will be unpackaged in a local folder called insightedge, and then you can install it by simply typing:

All of the commands described here assume you’ve fetched the Helm chart and it should be executed from that folder, but you can use any of Helm’s install options (remote location, repo reference, etc.)

It should be noted that before beginning to work with XAP or InsightEdge, the following must be installed on the local machine or a VM:

Once you take these steps, all you need to define are the resources, number of nodes, whether the installation should be HA-enabled (high availability), and whether the installation/deployment is for a production, testing, or staging environment.

Creating HELM Charts for an InsightEdge Deployment 

Deploying InsightEdge via Helm

The main chart insightedge Helm is dynamically linked to its subcomponents (management, data, zeppelin, etc.) as follows:

– insightedge

– insightedge-manager

– insightedge-pu

– insightedge-zeppelin

Installing the InsightEdge In-memory Real-Time Analytics Platform

To install the InsightEdge Platform as a partitioned cluster of two partitions, each with a high availability backup Space for, each primary Space of the same partition, define the values.yaml chart with the following parameters set in the space section:

The following Helm command allocates 512MiB of memory for each Data Pod, and defines the maximum Java on-heap memory as a 75%  threshold:

A Note on High Availability

High Availability is a must for mission-critical applications and can be assured by configuring each primary Data Pod with a minimum of one backup Data Pod, and three Management Pods to be deployed instead of one, so that a quorum of Platform Managers is always available to manage the data pods.

Both the Data Pods and the Management Pods should have the Pod anti-affinity property set to ‘true,’ so that the primary/backup sets and the managers are deployed on different nodes. This enables successful failover if a node gets disrupted.

The following Helm command deploys a cluster in a High Availability topology, with anti-affinity enabled:

The Benefits of the Cloud & Open-Source At Your Fingertips

Supporting Kubernetes empowers insight-driven organizations to build innovative real-time solutions, with the speed and confidence required to support their business initiatives while optimizing their TCO.

Bringing our enterprise-class real-time analytics, machine learning, and extreme data processing to Kubernetes provides enterprises with the benefits of being cloud-native and open-source driven with the added benefits of:

  • A one-click installation process with Helm
  • Deploying function with Spark ML, running collocated with the data
  • Deploying pre-trained deep learning models
  • Interactive queries and data visualization with the Apache Zeppelin notebook
  • Resources monitoring
  • Self-healing and auto-recovery on pod failure

Stay tuned for the next blog post on how to deploy machine learning jobs with Spark and visualize it with Apache Zeppelin on a cloud-native Kubernetes deployment.

To learn more about how to achieve a seamless, automated, and cloud-native installation and deployment of InsightEdge, we invite you to register for our webinar dedicated to Kubernetes.

Real-Time Analytics Meets Kubernetes – One Click, Any Cloud, Always-On
Yoav Einav on LinkedinYoav Einav on Twitter
Yoav Einav
VP Product @ GigaSpaces
Yoav drives product management, technology vision, and go-to-market activities for GigaSpaces. Prior to joining GigaSpaces, Yoav filled various leading product management roles at Iguazio and Qwilt, mapping the product strategy and roadmap while providing technical leadership regarding architecture and implementation. Yoav brings with him more than 12 years of industry knowledge in product management and software engineering experience from high growth software companies. As an entrepreneur at heart, Yoav drives innovation and product excellence and successfully incorporates it with the market trends and business needs. Yoav holds a BSC in Computer Science and Business from Tel Aviv University Magna Cum Laude and an MBA in Finance from the Leon Recanati School in Tel Aviv University.