AI, fast data, machine learning, GDPR, Spark, real-time analytics – all of your favorite buzzwords discussed and analyzed at one super event! We’re talking about the upcoming Strata Data Conference, happening between September 11– 13 in New York City. For those of you not familiar with the conference, it’s basically the place to be if you work in a data-driven business. No other conference unites leaders, innovators, strategists, developers and technologists from across the business world to discuss the intersection of business and “cutting-edge” data science.
We’ve been looking over the conference schedule and here are our top picks for hottest sessions at Strata Data Conference to attend:
1. Your 5 billion rides are arriving now – scaling Apache Spark for data pipelines and intelligent systems at Uber
- Felix Cheung (Uber)
- 11:20am–12:00pm Wednesday, 09/12/2018
- Location: 1A 10
- Level: Intermediate
This talk promises to dive into how the data stack has evolved to chase the explosive growth in the last few years and review current internal service and tooling offerings including a few pipeline-as-a-service. Specifically, this session will examine the role Apache Spark, a very popular open source distributed Big Data platform that is also experiencing rapid growth, play in all of these throughout the years. This session will also analyze a few unique challenges with reliability, resource utilization, observability at high volume and scale and explore how tojuggle the business reality along with the idealism of Free and open-source software (FOSS) – overcoming the hurdle in engaging the open source community.
2. Next Generation Cybersecurity via Data Fusion, AI and BigData: Pragmatic Lessons from the Font Lines in Financial Services
Location: Expo Hall
This session will cover the important issue of cybersecurity in Financial Services. Today, the threat to the users of the Internet, regardless if they are individual users or companies, is increasing in quantity and quality. The various drivers for this increase are many, but the ease of being a cybercriminal, with an easy access to online tools and no need to travel to conduct this faceless crime, makes cybercrime almost risk-free and very profitable. Building tech tools and platforms are a necessary component in today’s cyber-frontier. At the center of a global financial institution must be trusted, as the modern customer living in a hyper-connected world will need to know and demand, that their most sensitive personal information on i.e. identity, address, salary, mortgage, credit card spends, pension, travel, shopping habits are kept safe. It’ll be interesting to learn how Barclays has re-built their Global Information Security division to be strategic, Intelligence-led, and future-proof by implementing new capabilities and developing a new ’fusion cell’ concept being able to utilize big data, AI, and machine learning.
- Rajiv Shah (GigaSpaces)
- 5:30-5:45pm, Wednesday, 09/12/2018 + 12:00-12:15pm, Thursday 09/13/2018
- Location: Intel theater at booth #717
- Level: Intermediate
Real-time applications and business systems require instant data processing, advanced analytics and the ability to leverage insight instantly for immediate action. In this session, you will learn how InsightEdge, an open source platform by GigaSpaces, simplifies the operationalization of AI together with Intel BigDL & Spark, Optane, Xeon Scalable processors, and MKL. The in-memory platform powers event-driven, real-time analytics on streaming data combined with historical data, for insight-driven organizations looking to address time-sensitive business decisions to enhance business operations and customer experience. A simplified, cost-effective architecture using an intelligent multi-tier storage module will be presented to demonstrate how organizations can access and act on mission-critical data in milliseconds. You will hear about case studies showing how Machine Learning and BigDL can be operationalized for price optimizations, fraud detection, predictive maintenance, risk calculations, and operational business intelligence, along with a live demo of automated call center routing.
- Yaroslav Tkachenko (Activision)
- 1:15pm–1:55pm Wednesday, 09/12/2018
- Location: 1A 23/24
- Level: Intermediate
What can be easier than building a data pipeline nowadays? You add a few Apache Kafka clusters, some way to ingest data (probably over HTTP), design a way to route your data streams, add a few stream processors and consumers, integrate with a data warehouse, store the data properly… wait, it does start to look like A LOT of things, doesn’t it? And you probably want to make it highly scalable and available in the end, correct? In this presentation, Activision will review how they went learned to develop and scale their data pipeline in Demonware/Activision not only in terms of messages/second it can handle but also in terms of supporting more games and more use-cases.
Airbnb’s data-driven products present a wide variety of unique ML problems ranging from traditional models built on structured data to state-of-the-art models that leverage unstructured data, such as user reviews, messages, and images. The ability to build, iterate on, and maintain healthy machine learning models is critical to Airbnb’s success. An end-to-end solution typically needs to cover data collection, feature engineering, training, deploying, serving, and monitoring. Presently, few platforms are capable of doing all of the above in a user-friendly way. Moreover, the heterogeneous nature of ML problems and the requirement of scalability pose challenges to fast iteration and productionisation.
Airbnb will review their solution meant to tackle these business and technical challenges – Bighead – built on Python, Spark, and Kubernetes. The components include a lifecycle management service, an offline training and inference engine, an online inference service, a prototyping environment, and a Docker image customization tool. Each component can be used individually. In addition, Bighead includes a unified model building API that smoothly integrates popular libraries including Tensorflow, XGBoost, and PyTorch. Each model is reproducible and iterable through standardization of data collection and transformation, model training environments, and production deployment. This talk covers the architecture, the problems that each individual component and the overall system aims to solve, and a vision for the future of machine learning infrastructure.
6. Executive Briefing: Enhance your Data Lake with comprehensive Data Governance to improve adoption and meet compliance needs
Sanjeev Mohan (Gartner)
Location: 1E 14
In this talk, Gartner will discuss how organizations have been struggling to bring their data lakes into main-stream due to the fact that many of them lack adequate governance. This limits the use of data lakes beyond a sandbox for data science workloads. However, to effectively deliver a broad set of use cases, organizations have to make sure that their data assets are properly governed, secured and trustworthy. A new raft of regulations, such as the EU GDPR, are providing the required catalyst to improve data hygiene. According to Gartner, data governance is no longer optional. It never was. In 2018, the sexist job now involves data governance. In this session, Gartner will attempt to answer the question of how we can make this task impactful and easier to accomplish and will walk you through an end-to-end architectural blueprint for information governance and best practices for helping organizations understand, secure, and govern diverse types of data in enterprise data lakes.
- Moty Fania (Intel)
5:25pm–6:05pm Wednesday, 09/12/2018
Location: 1A 15/16
Recent years have seen significant evolvement of deep learning and AI capabilities. AI solutions can augment or replace mundane tasks, increase workforce productivity, and relieve human bottlenecks. Unlike traditional automation, these solutions include cognitive aspects that used to require human decision making. In some cases, deep learning has proven to be even more accurate than humans in identifying patterns and therefore can be effectively used to enable various kinds of automated, real-time decision making.
The advanced analytics team at Intel IT recently implemented an internal visual inference platform—a high-performance system for deep learning inference—designed for production environments. This innovative system enables easy deployment of many DL models in production while enabling a closed feedback loop where data flows in and decisions are returned through a fast REST API. To enable stream analytics at scale, the system was built in a modern microservices architecture using technologies such as TensorFlow, TensorFlow serving, Redis, Flask and more. It is optimized to be easily deployed with Docker and Kubernetes and cuts down time-to-market for deploying a DL solution. By supporting different kinds of models and various inputs, including images and video streams, this system can enable deployment of smart visual inspection solutions with real-time decision making.
In this presentation, Intel will explain how they implemented the platform and shares lessons learned along the way, including how Intel identified the set of characteristics and needs that are common to AI scenarios and made them available in this platform, architecture and related technologies and potential use cases that can leverage deep learning visual inference to provide meaningful insights.
8. Case Study: A Spark-based Distributed Simulation Optimization Architecture for Portfolio Optimization in Retail Banking
- Kaushik Deka (Novantas), Ingrid Liu (Novantas)
- 1:10pm–1:50pm Thursday, 09/13/2018
- Location: 1A 23/24
- Level: Intermediate
In retail banking, product managers have to regularly optimize their consumer portfolio across products, markets, customer segments, and other dimensions for a range of objective functions. These range from maximizing total revenue over N months across the entire portfolio with the least interest expense to adjusting front and back book pricing to narrowly defined regional and product-level targets. In all use cases, the unit of optimization is the most granular pricing cell where rate is a variable, and the optimization scope can easily involve hundreds of thousands of such pricing cells across multiple geographies, products, and channels. What makes it even more complicated are real-world constraints on those pricing cells that make them inter-dependent (such as price ordering, lock-step behavior, “frozen” cells, and more). In this session, Novantas will review the three main challenges they’ve faced to design a solution to the above problem and how they overcame each hurdle while focusing on a case study about a building a Spark-based distributed optimization architecture.
- Swetha Machanavajhala (Microsoft), Xiaoyong Zhu (Microsoft)
- 4:20pm–5:00pm Thursday, 09/13/2018
- Location: 1A 15/16
- Level: Beginner
Microsoft will discuss the great demand of Machine Learning and Artificial Intelligence applications in the audio domain, such as home surveillance (detecting glass break and alarm events), security (detecting explosion and gunshots), self-driving cars (providing more security based on sound event detection), predictive maintenance (predict machine failures via vibrations in manufacturing sector), emphasizing emotions in real-time translation and music synthesis. In this talk, they will focus on how to train a Deep Learning model on Microsoft Azure for sound event detection using Urban sounds dataset and provide an overview of how to work with audio data, along with references to Data Science Virtual Machine (DSVM) notebooks.
GigaSpaces will be participating in the Strata Data Conference in New York City this fall. Drop by our booth #1412 to chat about In-Memory Computing, Real-time Analytics, AI, Deep Learning and more! Plus, we’ll be demoing our InsightEdge Platform so you can see in real-time what real-time analytics can do. We’ll also be speaking about deep learning and presenting our demo at the Intel theater at booth #717 on September 12th and 13th. Don’t miss it!
For more information about the event click here.