MLOps World; Machine Learning in Production 2021

Actions and Detail Panel

Sales Ended

Event Information

Share this event

Date and time



Online event

Refund policy

Refund policy

Refunds up to 30 days before event

Event description
Please register here:

About this event

		MLOps World; Machine Learning in Production 2021 image

		MLOps World; Machine Learning in Production 2021 image

		MLOps World; Machine Learning in Production 2021 image

		MLOps World; Machine Learning in Production 2021 image

		MLOps World; Machine Learning in Production 2021 image

		MLOps World; Machine Learning in Production 2021 image

Please register here:

A virtual event dedicated exclusively to ML in Production. 

4 days - Join when you can - all recordings provided. *PLEASE NOTE: Workshop spots are limited and some will be full. Recordings will be provided for all ticket-purchasers*

(see updated workshops/sessions with abstracts here)

Each ticket includes;  

  • Access 50+ hours of live-streamed content (incl. recordings)
  • Workshops to help build and deploy (Sagemaker, MLFlow, Kubeflow etc.)
  • Network and connect through our event app 
  • Q+A with speakers
  • Channels to share your work with community
  • Run your chat groups and virtual gatherings!


Too few companies have effective AI leaders and an effective AI strategy. 

Taken from the real-life experiences of our community, the Steering Committee has selected the top applications, achievements and knowledge-areas to highlight across 4 days.

Virtual format:

    • Case Studies
    • Executive Track – Business Alignment
    • Advanced Technical Research
    • Workshops
    • Virtual networking tools allow you to *see all attendees* and message them directly
    • Interact with speakers, live. Asking questions during the talks. 

We believe these events should be as accessible as possible and set our ticket passes accordingly 

MLOps World is an international community group of practitioners trying to better understand the science of deploying ML models into live production environments.

Created initially by the Toronto Machine Learning Society (TMLS) this initiative is intended to unite and support the wider AI Ecosystem, companies, practitioners, and academics and contributors to open-source communities operating within it.

With an explorative approach, our initiatives address the unique needs of our community of over 9,000+ ML researchers, professionals, entrepreneurs and engineers. Intended to empower its members and propel productionized ML. Our community gatherings and events attempt to re-imagining what it means to have a connected community; offering support, growth, inclusion for all participants.

What to expect at MLOps World; Machine Learning in Production 2021;

Business Leaders, including C-level executives and non-tech leaders, will explore immediate opportunities, and define clear next steps for building their business advantage around their data.

With an explorative approach, our initiatives address the unique needs of our community of over 9,500+ ML researchers, professionals, entrepreneurs and engineers. Intended to empower its members and propel AI research, & business applications on a global stage, our events attempt to re-imagining what it means to have a connected community; offering support, growth, inclusion for all participants.


Q: What are the technical requirements to be able to participate?

Laptop or personal computer, strong, reliable wifi connection. Google Chrome is recommended to run the Virtual Conference platform.

Q: Can I watch the live stream sessions on my phone or tablet computer?

Yes, the Virtual Conference is accessible via a smartphone or tablet.

Q: Which sessions are going to be recorded? When will the recordings be available and do I have access to them?

All sessions will be recorded during the event (provided speaker permissions) and will be made available to attendees approximately 2-4 weeks after the event and be available for 12 months after release.

Q: Are there ID or minimum age requirements to enter the event? There is not. Everyone is welcome.

Q: How can I contact the organizer with any questions? Please email

Q: What's the refund policy? Tickets are refundable up to 30 days before the event.

Q: Why should I attend ? Developments are happening fast - it's important to stay on top.

For businesses leaders, you will have direct contact with the people that matter most; consultants and experts, potential new hires, and potential new clients. For data practitioners, you'll have an opportunity to fast-track your learning process with access to relevant use-cases, and top quality speakers and instructors that you'll make lasting connections with while building your network. 

The event is casual and tickets are priced to remove all barriers to entry. Space, however, is limited. 

Q: Who will attend? The event will have three tracks: One for Business, one for Advanced Practitioners/Researchers and one for applied use-cases (Focusing on various Industries). Business Executives, PhD researchers, Engineers and Practitioners ranging from Beginner to Advanced. See Attendee Demographics and a list of the Attendee Titles from our past event here. 

Q: Can I speak at the event?

Yes you can submit an abstract here

*Content is non-commercial and speaking spots cannot be purchased. 

Q: Will you give out the attendee list? No, we do our best to ensure attendees are not inundated with messages, We allow attendees to stay in contact through our slack channel and follow-up monthly socials.

Q: Can my company have a display? Yes, there will be spaces for company displays. You can inquire at faraz at mlopsworld dot com

		MLOps World; Machine Learning in Production 2021 image

Hands-on Workshop: How to Build Pipelines with Kubeflow

Mohamed Sabri, Senior Consultant in MLOps, Rocket Science

Abstract: Description : The workshop is a hands-on session where we will discover Kubeflow pipelines. We will learn how to create an environment with Kubeflow on Kubernetes then get familiar with the environment.  After that we will create our pipelines and upload them in Kubeflow.  We will create couple of pipelines and run schedules for retraining. Session length: 4 to 5 hours.

Requirements : Have a personal AWS account with credit card processed Background : Knowledge in Docker/Kubernetes/Virtualization Experience with Python

What You'll Learn: Become more comfortable with tools like Kubeflow

Technical Level of the Talk: 5/5

Kubeflow and Feast Machine Learning Use Case End-to-End Production

Aniruddha Choudhury, Senior Data Scientist, Publicis Sapient

Abstract: First setting up the kubeflow and feast with Kafka and setting up the framework for Machine learning with classification problem with feast and building the pipeline with Kubernetes , docker in GCP and hosting the pipeline in Kubeflow and using the data feature store management with feast. And serving the endpoint with KFserving and checking the model performance with Grafana. And performing the API call for prediction endpoint for Real time Kafka data and batch data.

What You'll Learn: The audience will learn majorly how to use kubeflow and its various services like Model serving, Training pipeline, and Feast low latency feature store for real-time Kafka service or Batch service. And the exposure of Google cloud with Kubeflow.

Technical Level of the Talk: 5/5

Fixing Data Quality at Scale with Data Observability

Barr Moses, CEO & Co-Founder and Lior Gavish, CTO, Monte Carlo

Abstract: Do your product dashboards look funky? Do your ML models keep drifting? Are you sick and tired of running a SQL query only to discover that the dataset you’re using is broken or just plain wrong? These errors are highly costly and affect almost every team, yet they’re typically only addressed on an ad hoc basis and in a reactive manner.

Technical level: (3/5) 

What you will learn:  Learn how to minimize data downtime and increase observability into your data ecosystem. You’ll explore the concept of data downtime and see how to measure it to determine the quality and health of your data using SQL, a sample data table, and a Jupyter notebook. From there, you’ll apply software engineering principles of observability to your data through five key pillars of data health—volume, schema, lineage, freshness, and distribution—as you set service-level objectives for data observability in your data and implement basic data observability checks.

What is unique about this, which can't be found online? It is an interactive course and walks through best practices and use cases that can not be found online.

Hands-on Workshop: Advanced Model Deployment with Seldon Deploy: Deploying a Customer Segmentation Model; Configuring Drift, Outlier Detection & Model Explainability

Tom Farrand, Lead Solutions Engineer, Seldon

Abstract: The workshop will focus on the creation, deployment, monitoring, and management of a machine learning model for performing customer segmentation. The workshop will cover: Exploring a subset of an e-commerce dataset detailing actual purchases from ~4000 customers over a period of one year training several models on a pre-processed dataset Deploying trained model artifacts with Seldon Deploy Training an anchor tabular model and an outlier detector from Seldon Alibi Updating Seldon Deployments with explainers and outlier detectors

Technical level: (5/5) 

What you will learn:  Pragmatic understanding of ML model deployment, for a customer segmentation use case.

MLOps Orchestration: Your Highway to Accelerating Deployment of AI

Yaron Haviv, Co-Founder and CTO, Iguazio

Abstract: MLOps holds the key to accelerating the development and deployment of AI so that enterprises can derive continuous business value, deploy and monitor a growing amount of AI applications in production. But in our journey to create continuous development and delivery (CI/CD) of data and ML intensive applications, we often need to integrate many tools to make deployment of AI simpler and more efficient and to account for a growing set of use cases of growing complexity. This is a difficult, time and labor-intensive task. Is there finally an open source tool that can orchestrate all of these other tools, to make the process more user-friendly, and allow for almost any use case, no matter how complex

What you will learn:  In this workshop, we will explore the concept of MLOps Orchestration and how it can simplify the process of getting data science to production in any environment (multi-cloud, on-prem, hybrid), harnessing any data type or source (real-time / streaming, historic, structured, unstructured, etc.), while drastically cutting down the time needed to get data science to production. We’ll show how to map a business problem into an automated ML production pipeline and identify the right tools for the job, and ultimately how to run Al models in production at scale to accelerate business value with AI – all using open source technologies. The session will include a live demo and real customer case studies across use cases such as fraud prediction, real-time recommendation engines, and predictive maintenance.

MLOps with PyCaret

Moez Ali, Founder and Author, Pycaret

Abstract:  PyCaret is an open-source, low-code machine learning library in Python that allows you to go from preparing your data to deploying your model within minutes in your choice of environment. This talk is a practical demo using PyCaret in your existing workflows and supercharges your data science team's productivity.

What you will learn:  Machine Learning + MLOps

Hands-on Workshop Complete ML Lifecycle with MLflow: Learn its Four Components

Jules S. Damji, Senior Developer Advocate, Databricks, Inc.

Abstract:  ML development brings many new complexities beyond the traditional software development lifecycle. Unlike in traditional software development, ML developers want to try multiple algorithms, tools, and parameters to get the best results, and they need to track this information to reproduce work. In addition, developers need to use many distinct systems to productionize models. To solve these challenges, MLflow, an open-source project, simplifies the entire ML lifecycle. MLflow introduces simple abstractions to package reproducible projects, track results, and encapsulate models that can be used with many existing tools, accelerating the ML lifecycle for organizations of any size.

Technical level: (5/5) 

What you will learn:  ML is becoming more pervasive in development and deployment across all verticals. With it, there has been a demand for tools that manage end-to-end ML model lifecycle management. We are still in the nascent days of MLOps, but having open-source modular options that allow extensibility and incorporate larger ML ecosystem frameworks is better than closed and monolithic options.

From Concept to Production: Template for the Entire ML Journey

Chanchal Chatterjee - Artificial Intelligence Leader; Elvin Zhu - AI Engineer; Allen Gour - AI Engineer, Google

Abstract:  We created an open-source template in python for the entire ML journey from concept to production. The workshop offers a 2 part hands-on tutorial. Each part will be for 2 hours. Part 1 starts with an example use case. It builds the ML components such as data prep, model hyper train, model train, model deploy, and online/batch prediction. These components are unit tested in a python notebook. Part 2 will show how to deploy these components in a Kubeflow pipeline with orchestration for training and prediction. The entire end-to-end ML pipeline is now ready for deployment. At the end of this tutorial, you will have hands-on experience building a model from concept to a final production-ready ML pipeline. The tutorial will be implemented on the Google Cloud Platform. Models can include xgboost and TensorFlow models.

What you will learn:  At the end of this tutorial, you will have hands-on experience building a model from concept to a final production-ready ML pipeline.

Building a Model Life Cycle: The First Step in Operationalizing AI Models,

Jim Olsen, CTO, ModelOp

Abstract:  Data Scientists have their choice of model development tools, platforms, and factories for developing AI models. But to operationalize a model, you need to have the model life cycle established for the models that you want to put into production. The model life cycle is what defines the operational steps from time of deployment to retirement for all models in your organization.

In this session, Jim Olsen will show you how to design and build a model life cycle, including how to incorporate Industry best practices. Jim will discuss the considerations for creating the model life cycle, who should be involved, and the types of issues that must be considered.

What you will learn:  The workshop will cover 4 distinct areas: Basics of a model life cycle: What makes up a model life cycle and how do you design one Governance: Developing a governance workflow that meets your company’s needs Monitoring: How to monitor models post-deployment in a flexible manner Remediation: Creating remediation workflows that track and accelerate time to resolution

Technical level: (5/5) 

Essential Workshop to Exploratory Data Analysis and Feature Engineering

Vladimir Rybakov, Head of Data Science & Aleksandr Mester, Data Scientist , WaveAccess

Abstract: Most experienced data scientists would agree that data processing takes most of the time when undertaking machine learning projects. Both data pre-processing and feature engineering quality is crucial for model performance. However, it is not typically an easy thing to do. Dealing with real data, you are likely to encounter such problems as noise, missing values, excessive information, etc. Building a good feature vector turns out to be just as hard. In this workshop, you will learn some simple but effective ways of handling these problems using a public Google Play Store dataset as an example.

What You'll Learn: Hands-on experience with EDA and feature engineering

Technical Level of the Talk: 5/5

Good, Fast, Cheap: How to Do Data Science with Missing Data

Matthew Brems, Managing Partner & Principal Data Scientist, BetaVector

Abstract: If you've never heard of the "good, fast, cheap" dilemma, it goes something like this: You can have something good and fast, but it won't be cheap. You can have something good and cheap, but it won't be fast. You can have something fast and cheap, but it won't be good. In short, you can pick two of the three but you can't have all three.

If you've done a data science problem before, I can all but guarantee that you've run into missing data. How do we handle it? Well, we can avoid, ignore, or try to account for missing data. The problem is, none of these strategies are good, fast, *and* cheap.

We'll start by visualizing missing data and identify the three different types of missing data, which will allow us to see how they affect whether we should avoid, ignore, or account for the missing data. We will walk through the advantages and disadvantages of each approach as well as how to visualize and implement each approach. We'll wrap up with practical tips for working with missing data and recommendations for integrating it with your workflow!

What You'll Learn:

By the end of the session, you should be able to:

  • Quantify the impact of missing data using simulations.
  • Identify best practices for avoiding and ignoring missing data.
  • Define unit missingness and item missingness, missing completely at random, missing at random, and missing not at random.
  • Describe techniques for doing data science with unit missingness, including class weight adjustments.
  • Describe techniques for doing data science with item missingness and their advantages and disadvantages, including deductive imputation, single and multiple model-based imputations, and the pattern submodel method.
  • Describe techniques for understanding whether data is missing completely at random, missing at random, or missing not at random.
  • Use a flexible but consistent workflow for applying statistical and data science techniques to missing data.

Technical Level of the Talk: 5/5

Long Live Models: A Tutorial on Model Monitoring in Production

Emeli Dral, CTO / Co-founder & Elena Samuylova, CEO / Co-Founder, Evidently AI

Abstract: No ML model lives forever. As soon as it is deployed in production, it starts to degrade. To make sure it continues to deliver business value, we need to closely keep an eye on its performance and decide when to intervene.

In this tutorial with code, I will show it on a practical example using open-source tools:

  • How model decay happens and how to analyse it in production
  • Which metrics to look at when interpreting model performance
  • How to do early monitoring when you do not have immediate feedback or ground-truth labels

What You'll Learn: That ML models degrade with time. They need maintenance and monitoring for successful operations. Even if you do not have immediate feedback, you can perform early monitoring by checking for data and prediction drift.

Technical Level of the Talk: 4/5

Building an ML Platform from Scratch

Alon Gubkin, VP R&D, Aporia

Abstract: In this workshop, you’ll learn how to set up an ML platform using open-source tools like Cookiecutter, DVC, MLFlow, KFServing, Pulumi, GitHub Actions, and more. We'll explain each tool in an intuitive way and the problem they solve, in order to build a useful platform that combines all of them. All code will be available on GitHub after the workshop, so you'll be able to easily integrate it into your existing work environment. There’s no “one size fits all” ML Platform. Each organization has its own needs and requires a customizable and flexible solution. In this workshop, you’ll learn how to create the right solution for your own organization. The workshop is intended for data scientists and ML engineers from all industries – from small startups to large corporations and academic institutions.

What You'll Learn: Building an ML platform doesn’t have to be time-consuming and difficult. And it doesn’t need to be a huge platform in order to be effective. With the right tools, you can easily build your own customizable platform that will increase your productivity and the quality of your models.

Technical Level of the Talk: 5/5

Closing the Production Gap with MLOps

Asger Pedersen, ML Worldwide Technical Lead, Datarobot

Abstract: This session will explore and demonstrate how DataRobot's MLOps can speed up deployment, monitor drift, and accuracy, ensure governance, and ongoing model lifecycle management including how to do automatic retraining and have challenger models in production. Also, this session will cover how to deploy and monitor models built outside of DataRobot.

What You'll Learn: How DataRobot's MLOps can speed up deployment, monitor drift and accuracy, ensure governance and ongoing model lifecycle management including how to do automatic retraining and have challenger models in production

Technical Level of the Talk: 4/5

Deploying an E2E ML Pipeline with AWS SageMaker - What Amazon Didn't Tell You

Kollol Das, ML Research Lead and Fred Caroli, ML Engineer, Sensibill

Abstract: In this workshop, we will walk you through our perilous journey of setting up an ML pipeline on AWS SageMaker. From GroundTruth to Training Jobs to Endpoints, we will talk about the surprises that lay in wait for unsuspecting engineers and help you navigate the vast underbelly of the AWS ecosystem.

What You'll Learn: The workshop walks the audience to a successful deployment of an end-to-end pipeline that supports our inference structure. There are several frustrating issues on SageMaker due to some documentation issues, and we walk the audience on how to overcome them to use an otherwise great system.

Technical Level of the Talk: 5/5

MLOPs with Dataiku: Considerations For Model Deployment & Monitoring

Christina Hsiao, Sr. Product Marketing Manager, Dataiku

Abstract: MLOps is a hot topic as organizations tackle the challenges of implementing models at scale. In this session, Christina Hsiao, Sr. Product Marketing Manager at Dataiku, will discuss how governed and repeatable model operations can make the difference between success and failure in your AI pursuits.

What You'll Learn: Christina will walk through key elements of MLOps, considerations for model deployment and monitoring strategies, how you can leverage Dataiku's framework to streamline your MLOps processes and ensure models continue performing at their best.

Some topics covered during the session include:

  • The types of concerns different roles have when it comes to models in production and which KPIs to measure
  • Common causes for data drift or model degradation and approaches for detecting these shifts
  • What factors to consider when determining appropriate model intervention and retraining strategies

5 Governance Capabilities You Need in MLOps

Trey Morrow and Dwayne Dreakford, Solutions Engineers, Algorithmia

Abstract: Algorithmia is machine learning operations software that manages all stages of the ML lifecycle within existing operational processes. Join us for an overview of the Algorithmia platform, and understand how to put models into production quickly, securely, and learn about new and exciting features.

This session is made for data practitioners and leaders alike - learn how to assess your organization’s current maturity and chart your course to full MLOps maturity.During this session, you will learn methods to effectively implement components of ML governance to achieve a level of control and visibility into how models operate in production.

Is Your ML Model Trustworthy?

María Grandury, Machine Learning Research Engineer, Neurocat GmbH

Abstract: In recent years, Machine Learning models and architectures have become increasingly complex. This growing complexity makes it more difficult to deliver high quality in terms of model performance, robustness and explainability. The introduction of automated evaluations of the trustworthiness of a model is one solution to guarantee that your clients can rely on your model's predictions. This talk will guide you through the different AI quality pillars, their importance and how to evaluate them.

What You'll Learn:

  • Why is it important to assess the quality and trustworthiness of your ML model
  • Which are the three AI quality pillars: performance, robustness & explainability
  • How can you evaluate these qualities and add this step to your MLOps workflow

Technical Level of the Talk: 4/5

Integrating Multiple MLOps Tools Together on Google Cloud Platform

Mefta Sadat, Senior ML Engineer, Loblaw Digital

Abstract: As one of Canada's largest grocery store chains, our team runs several ML systems on production. We wanted to take one such system and integrate it with several MLOps tools including MLflow, Seldon Core, and Feast on Google Cloud Platform. The goal is to reduce overall development time of a model from EDA to deployment, as well as enable end to end tracking of these ML pipelines.

In this talk, we will share our experience on setting up a recommender system using the tools mentioned. We will also talk about our overall platform architecture and how the MLOps tools fit into the end-to-end ML pipeline.

What You'll Learn: How to put a ML system on production using some of the most commonly used MLOps tools and how MLOps fit into the end to end ML pipeline.

Technical Level of the Talk: 5/5

Connecting Data Scientists to Production with the Tempo Python SDK

Clive Cox, CTO, Seldon

Abstract: There is generally a gap between data scientists who create machine learning models and existing DevOps tools and processes to put those models into production. This can lead to models not being properly validated and isolated from appropriate business logic and associated monitoring for their proper use. Many models remain as research outputs and do not make it to production. In this talk we will show how using Tempo (an open-source Python SDK) data scientists can easily prepare and test locally their machine learning models and then deploy them either directly or as part of CI/CD and Gitops processes. Data scientists can be involved in the inference logic needed to call their models correctly and easily add other required inference components for their models such as outlier detectors and model explainers as part of their work. We will illustrate deploying simple models to more advanced outlier detection, model explanation, and multi-model ensembles with custom business logic. We will show models locally tested and then pushed to production on Kubernetes with Seldon or Kubeflow’s KFServing. Data scientists will learn how they can get their work to production faster while ensuring their model is used correctly when it reaches production.

What You'll Learn: How data scientists can prepare their models for production and ensure the models they create are correctly deployed and utilized as expected. We will introduce the open-source Tempo python SDK from Seldon to provide this functionality.

Technical Level of the Talk: 5/5

Data Scientist or ML Engineer: Who Do We Need Now?

Marcin Mizianty, VP of Data Science, AltaML

Abstract: Data Scientist has become the “sexiest” job title in the 21st century. However, we see significant challenges, especially in traditional industries, regarding model adoption and deploying ML to production. This is due to ML being mostly in academia for the last 70 years and not a lot of thought that went into the operationalization of ML.

A lot of Data Scientists are traditionally more focused on scientific approach (scripting and testing models) without considering operational aspects of it. We need to realize there must be more engineering rigor in developing ML models vs just calling model fit(). It’s also not as simple as taking everything from software development.

When software development initially came from academia, there were no proper development frameworks and early adopters borrowed traditional engineering processes (waterfall!). It took a long time to invent agile. I believe we have a similar situation with ML where people think we can just apply software development practices to ML development, whereas not everything is directly applicable. But we can take the best practices and build upon them.

In this talk, we will take a look at those challenges and explore potential solutions with switching model development into the hands of ML Developers/Engineers that are focused on developing machine learning software systems (from scratch - data source) keeping in mind that it will be deployed on production.

What You'll Learn: We need to realize there must be more engineering rigor in developing ML models for them to be used in the industry.

Technical Level of the Talk: 3/5

Breaking the Monolithic ML Pipeline with a Feature Store

Jim Dowling, CEO, Logical Clocks

Abstract: How can a feature store help with MLOps? In this talk, we introduce how a Feature Store for Machine Learning (ML) can decompose end-to-end ML pipelines into (1) a feature pipeline that takes raw data from data warehouses, a data lake, and operational data stores and transforms it into features that are cached in the feature store, and (2) model training/validation/deployment pipelines. Both pipelines have different requirements, often favor different technologies (Spark for feature pipelines, Python for model training), can run at different cadences, and are even managed by different teams (data engineering, and data scientists, or ML engineers). The benefits of the Feature Store for ML architecture will be elucidated throughout the talk.

What You'll Learn: Feature Stores for Machine Learning help manage your data through the entire AI Data Lifecycle, enabling Enterprise data to be available for training and serving models, and ensuring governance that will be needed when AI becomes legally regulated.

Technical Level of the Talk: 4/5

Machine Learning Tools: Skyline and RL-Scope

Gennady Pekhimenko, Assistant Professor, Faculty Member, University of Toronto / Vector Institute

Abstract: In this talk, I will present two recently developed tools for ML optimization: Skyline and RL-Scope. Skyline: Interactive in-editor computational performance profiling, visualization, and debugging for PyTorch deep neural networks ( RL-Scope is a cross-stack profiler for deep reinforcement learning workloads (

What You'll Learn: The availability of tools for efficient optimization for DNN and RL models

Technical Level of the Talk: 5/5

From 12 Months to 30 Days to AI Deployment: An MLOps Journey

Yaron Haviv - Co-Founder and CTO, Iguazio,; David Aronchik - Partner, Product Manager, Azure Innovations - Group in the Office of the CTO, Microsoft; Greg Hayes - Data Science Director, Ecolab

Abstract: In this talk we’ll share Ecolab’s journey with MLOps, and how this global hygiene technologies provider built a cutting-edge, cloud data science architecture which enables them to deploy new AI services every 30-90 days, to address risks before they occur.

Building scalable AI applications that generate value in real business environments require not just advanced technologies, but also better processes for data science, engineering and DevOps teams to collaborate effectively.

What You'll Learn: In this session we’ll touch upon the technology – and also the processes that need to be put in place to break the silos and efficiently implement MLOps across the enterprise

Technical Level of the Talk:

Developing a Data-Centric NLP Machine Learning Pipeline

Diego Castaneda - Data Scientist;  Jennifer Bader - Content Strategist, Shopify

Abstract: The number of components and level of sophistication in end-to-end ML pipelines can vary from problem to problem but there's one common element that is the key to make the whole system great and useful: your training data. The more time you spend developing the training dataset in your ML pipeline, the more positive results you'll get. In this talk, I'll present the use case of a text classification pipeline we developed from scratch to integrate with one of our products. I'll show details of how we designed an appropriate classification taxonomy, a consistent annotated training dataset, and how the end-to-end pipeline was pieced together to deploy a BERT-based model in a low latency real-time text classification system.

What You'll Learn: Attendees will learn strategies to develop a great training dataset, particularly in the NLP domain, from scratch and how to connect it with the rest of the ML pipeline to train, monitor, and deploy models for production use.

Technical Level of the Talk: 4/5

FLAML: Fast and Lightweight AutoML

Chi Wang, Principal Researcher and Qingyun Wu, Postdoc Researcher, Microsoft Research

Abstract: Hyperparameter optimization is a ubiquitous task, mostly treated like an expensive black-box optimization problem with high resource consumption. This talk will introduce MSR technologies to perform low-cost and effective hyperparameter optimization. We build FLAML: a fast and lightweight AutoML library to automatically find accurate machine learning models at a low cost. It significantly outperforms top-ranked AutoML libraries on a large open-source AutoML benchmark under equal, or sometimes orders of magnitude smaller budget constraints.

What You'll Learn: FLAML is a lightweight Python library that finds accurate machine learning models automatically, efficiently, and economically. It frees users from selecting learners and hyperparameters for each learner. It is fast and economical. The simple and lightweight design makes it easy to extend, such as adding customized learners or metrics. FLAML is powered by a new, cost-effective hyperparameter optimization and learner selection method invented by Microsoft Research. It significantly outperforms top-ranked AutoML libraries on a large open-source AutoML benchmark under equal, or sometimes orders of magnitude smaller budget constraints.

Technical Level of the Talk: 2/5

Productionizing Machine Learning at Scale with MLflow

 Matei Zaharia, Chief Technologist, Databricks and Assistant Professor, Stanford, Databricks

Abstract: Building and operating ML applications require different infrastructure from traditional software, which has led to the development of “ML platforms” specifically designed to build and manage ML applications. In this talk, I’ll discuss some of the common challenges in productionizing ML based on experience building MLflow, an open-source ML platform started at Databricks. MLflow is now the most widely used open-source project in this area, with over 4 million downloads per month and integrations with dozens of other products. I’ll also highlight some interesting problems users face at scale, such as the need for versioning and reproducibility on petabyte-scale datasets, the impact of privacy and interpretability regulations on ML, and “hands-free” ML use cases that can train thousands of models without direct tuning from the ML developer. I'll show how ongoing work in MLflow and the open-source ecosystem around it (Delta Lake, PyTorch, PyCaret, Apache Spark, etc) is helping to tackle these problems.

What You'll Learn: Challenges running ML in production, based on our experience at thousands of enterprise customers, and features available in MLflow and its open-source integrations.

Technical Level of the Talk: 5/5

Model Monitoring: What, Why, and How

 Manasi Vartak, CEO, Verta Inc

Abstract: For any organization whose core product or business depends on ML models (think Slack search, Twitter feed ranking, or Tesla Autopilot), ensuring that production ML models are performing with high efficacy is crucial. In fact, according to the McKinsey report on model risk, defective models have led to revenue losses of hundreds of millions of dollars in the financial sector alone. However, in spite of the significant harms of defective models, tools to detect and remedy model performance issues for production ML models are missing.

In this talk, we discuss why model monitoring matters, what do we mean by model monitoring, and considerations when setting up a model monitoring system.

What You'll Learn: Why model monitoring matters, what do we mean by model monitoring, and considerations when setting up a model monitoring system.

Technical Level of the Talk: 4/5

Design Patterns for MLOps

 Sara Robinson, Senior Developer Advocate, Google Cloud

Abstract: Released last year, the O'Reilly book Machine Learning Design Patterns captures best practices and solutions to recurring problems in machine learning. The authors, three Google engineers, catalog proven methods to help data scientists tackle common problems throughout the ML process. These design patterns codify the experience of hundreds of experts into straightforward, approachable advice. In this talk, Sara will dive into three patterns from the book focused on MLOps.

What You'll Learn: After this talk, attendees will have a better understanding of how to approach MLOps for their own ML tasks.

Technical Level of the Talk: 5/5

Systematic Approaches and Creativity; Building DoorDash's ML Platform During the Pandemic

Hien Luu, Sr. Engineering Manager and Dawn Lu. Senior Data Scientist DoorDash

Abstract: What is it like to build an ML platform during the pandemic, with a new team, at a new company? DoorDash’s mission is to grow and empower local economies. As DoorDash's business grows, it is essential to establish a centralized ML platform to accelerate the ML development process and to power the numerous ML use cases. This presentation will detail the DoorDash ML platform journey during the pandemic, which includes the way we establish a close collaboration and relationship with the Data Science community, how we intentionally set the guardrails in the early days to enable us to make progress, the principled approach of building out the ML platform while meeting the needs of the Data Science community, and finally, the technology stack and architecture that powers billions of predictions per day and supports a diverse set of ML use cases.

What You'll Learn: Building an ML platform during the pandemic is possible, but it requires a systematic approach and creative ideas.

Technical Level of the Talk: 3/5

Data and Process Governance for Responsible and Ethical AI

Krishna Gade, CEO & Co-founder, Fiddler AI

Abstract: As more businesses adopt AI, upcoming AI regulations are forcing companies to build responsible and ethical AI. Previously, many practitioners thought validating a model with Explainable AI before deployment was enough. Not anymore. With more AI solutions in place, firms are putting governance at the forefront by enforcing it throughout the AI lifecycle to ensure responsible and ethical AI development. In this session, we introduce a Model Performance Management framework, its advantages in the machine learning life cycle, and providing a continuous feedback loop to establish a responsible AI practice.

What You'll Learn: Using Model Performance Management to improve models more efficiently and how AI explainability enables MPM at the core.

The AI Captain; A study of ML at the Edge

Rob High, Vice President and CTO, IBM Networking and Edge Computing, IBM

Abstract: In this talk, we will examine the Mayflower project — its goals; its breakthroughs; its technology — and highlight it as an example of how AI is being brought to the edge.

What You'll Learn: Edge Computing

Technical Level of the Talk: 5/5

Machine Learning on Dynamic Graphs

Emanuele Rossi, Machine Learning Researcher, Twitter

Abstract: Graph neural networks (GNNs) research has surged to become one of the hottest topics in machine learning in the last years. GNNs have seen a series of recent successes in problems from the fields of biology, chemistry, social science, physics, and many others. So far, GNN models have been primarily developed for static graphs that do not change over time. However, many interesting real-world graphs are dynamic and evolving in time, with prominent examples including social networks, financial transactions, and recommender systems. In many cases, it is the dynamic behavior of such systems that conveys important insights, otherwise lost if one considers only a static graph. This talk will discuss Temporal Graph Networks, a recent and general approach for machine learning over dynamic graphs.

What You'll Learn: What dynamic graphs are, that they are ubiquitous in applications, and how to learn on dynamic graphs

Technical Level of the Talk: 5/5

MLOps Platform Architecture for E2E ML Pipelines

Joao Da Silva,  Lead Data Engineer, Avast

Abstract: At Avast we complete over 17 million phishing detections a day, providing crucial online protection for this type of attack. There are a lot of challenges that data scientists frequently face, such as inconsistent environments, transitioning from research to production, access management, model deployment. In this talk, Joao Da Silva will present Avast MLOps maturity journey, tooling, and cultural shift that has increased velocity, improved collaboration, and brought structure for productizing machine learning whilst integrating model tracking, storage, cross-system orchestration, and E2E model deployments for complete and modern machine learning pipelines Joao Da Silva will deliver a presentation that anyone in any organization can relate to and take notes on a real journey with regards to MLOps tooling, mindset, and adoption in a large organization.

What You'll Learn: This will guide attendees through Avast MLOps maturity journey, tooling, and cultural shift that has increased our velocity in delivering ML/AI Projects

Technical Level of the Talk: 4/5

Catch Me If You Can: Keeping Up With ML Models in Production

Shreya Shankar,  ML engineer, Ph.D. student

Abstract: Advances in machine learning and big data are disrupting every industry. However, even when companies deploy to production, they face significant challenges in staying there, with performance degrading significantly from offline benchmarks over time, a phenomenon is known as performance drift. Models deployed over an extended period of time often experience performance drift due to changing data distribution. In this talk, we discuss approaches to mitigate the effects of performance drift, illustrating our methods on a sample prediction task. Leveraging my experience at a startup deploying and monitoring production-grade ML pipelines for predictive maintenance, we also address several aspects of machine learning often overlooked in academia, such as the incorporation of non-technical collaborators and the integration of machine learning in an agile framework.

What You'll Learn: This talk will consist of: * Using Python and open-source datasets to demonstrate an example of training and validating a model in an offline setting that subsequently experiences performance degradation when it is deployed. * Using Prometheus, Grafana, and mltrace to show how one can build tools to monitor production pipelines and enable teams of different stakeholders to quickly identify performance degradation using the right metrics. This talk will be a slideshow presentation accompanied by a Python notebook demo. It is aimed towards engineers that deploy and debug models in production, but maybe of broader interest for people building machine learning-based products, and requires a familiarity with machine learning basics (train/test sets, decision trees).

Technical Level of the Talk: 5/5

Production Machine Learning Monitoring: Outliers, Drift, Explainers & Statistical Performance

Ihab  Ilyas, Professor, University of Waterloo

Abstract: The lifecycle of a machine learning model only begins once it's in production. In this talk, we provide a practical deep dive into best practices, principles, patterns, and techniques around the production monitoring of machine learning models. We will cover standard microservice monitoring techniques applied into deployed machine learning models, as well as more advanced paradigms to monitor machine learning models through concept drift, outlier detector, and explainability.

We'll dive into a hands-on example, where we will train an image classification machine learning model from scratch, deploy it as a microservice in Kubernetes, and introduce advanced monitoring components as architectural patterns with hands-on examples. These monitoring techniques will include AI Explainers, Outlier Detectors, Concept Drift detectors, and Adversarial Detectors. We will also be understanding high-level architectural patterns that abstract these complex and advanced monitoring techniques into infrastructural components that will enable for scale, introducing the standardized interfaces required for us to enable monitoring across hundreds or thousands of heterogeneous machine learning models.

What You'll Learn: We can generate private data samples that are as useful as the original sensitive data via private structured learning.

Technical Level of the Talk: 5/5

The Critical Missing Component in the Production ML Stack

Alessya Visnjic,  CEO and Co-founder, WhyLabss

Abstract: The day an ML application is deployed to production and begins facing the real world invariably begins in triumph and ends in frustration for the model builder. The joy of seeing accurate predictions is quickly overshadowed by a myriad of operational challenges that arise, from debugging to troubleshooting to monitoring. In DevOps, analogous software operations have long been refined into an art form. Sophisticated tools enable engineers to quickly identify and resolve issues, continuously improving software stability and robustness. By contrast, in the ML world, operations are still done with Jupyter notebooks and shell scripts. One of the cornerstones of the DevOps toolchain is logging. Traces and metrics are built on top of logs, thus enabling monitoring and feedback loops. What would a good logging tool look like in an ML system? In this talk, we present a powerful new tool—a logging library—that enables data logging for AI applications. We discuss how this solution enables testing, monitoring, and debugging of both an AI application and its upstream data pipeline(s). We offer a deep dive into some of the key properties of the logging library that enable it to handle TBs of data, run with a constraint memory footprint, and produce statistically accurate log profiles of structured and unstructured data. Attendees will leave the talk equipped with best practices to supercharge MLOps in their teams.

What You'll Learn: How to enable logging in data & ml applications in order to unlock MLOps activities such as testing, debugging, and monitoring.

Technical Level of the Talk: 4/5

Security Audits for Machine Learning Attacks

Navdeep Gill - Lead Data Scientist & Team Lead, Responsible AI & Michelle Tanco - Customer Data Scientist, H2O.AI

Abstract: There are several known attacks against ML models that can lead to altered, harmful model outcomes or exposure of sensitive training data. Unfortunately, traditional model assessment measures don’t tell us much about whether a model is secure. In addition to other debugging steps, it may be prudent to add some or all of the known ML attacks into any white-hat hacking exercises or red-team audits your organization is already conducting. This talk will go over common machine learning security attacks and the remediation steps an organization can take to deter these pitfalls.

What You'll Learn: They will learn about different types of security vulnerabilities when it comes to machine learning and how to handle such issues.

Technical Level of the Talk: 5/5

Marius: Machine Learning Over Billion-Edge Graphs 10x Faster and 5x Cheaper

Theo Rekatsinas, Assistant Professor, University of Wisconsin-Madison

Abstract: This talk describes Marius, a software system that aims to make scaling of modern AI models over billion-edge graphs dramatically easier. Marius focuses on a key bottleneck in the development of machine learning systems over large-scale graph data: data movement during training. Marius addresses this bottleneck with a novel data flow architecture that maximizes resource utilization of the entire memory hierarchy (including disk, CPU, and GPU memory). Marius’ no-code paradigm allows users to simply define a model and enjoy resource-optimized training out of the box. This talk will describe how Marius can train deep learning models over graphs with more than a billion edges and 550GB of total parameters 10x faster and 5x cheaper than competing industrial systems.

What You'll Learn: 1) How we can enable training of large-scale machine learning models over sparse structured data in a resource-optimizer manner 2) Learn about Marius and its functionalities

Technical Level of the Talk: 5/5

Challenges for ML Operations in a Fast Growing Company

Gulsen Kutluoglu,  Director of Engineering and Sam Cohan, Principal ML Engineer at Udemy

Abstract: At Udemy, multi-faceted growth created different challenges in terms of ML platform and tooling as well as the related processes. One such challenge was to come up with a scalable platform to train and execute different types of ML models in real-time or batch. We overcame this particular challenge by building generic components that increased reuse and led to faster delivery and lower maintenance costs. Another challenge was to find a good way to efficiently serve the needs of the different parts of the organization despite their varying requirements. For instance, distributed applications cannot be avoided for many of our use cases that deal with web-scale data, but there are still a significant number of cases that can be more efficiently handled without the complexities of distributed applications. To overcome this challenge, we had to unify the frameworks (and tooling whenever possible) for both cases, mainly to decrease the learning curve for data scientists and increase maintainability. Another type of challenge we faced was the need for increased focus on developer and data science ergonomics as the organization grew. In this talk, I will present an overview of the main ML-related challenges at Udemy as the company experienced explosive growth in users, product complexity, and organization size. I will also present the best practices we developed so far and the outstanding problems in our current state.

What You'll Learn: Different challenges are introduced in terms of ML platform, tooling, and processes as the company grow

Technical Level of the Talk: 4/5

MLOps vs. ModelOps – What’s the Difference and Why You Should Care

Jim Olsen, CTO, ModelOp

Abstract:   MLOps and ModelOps are different. In this session, we will cover how ModelOps not only encompasses the basic model monitoring and tuning capabilities of MLOps, but also includes a continuous feedback loop and 360 view of your models throughout the enterprise, providing reproducibility, compliance and auditability of all your business critical models. You will learn about good practices for: - Continuous model monitoring for early problem detection - Automated remediation for accelerating time to resolution - Establishing a continuous feedback loop model improvement What’s unique about this talk? We will cover model operational practices that you can apply to AI models along with other types of analytical models.

What you will learn: 

Technical Level of the Talk:

Quick Deploy Model Serving in Ranking Systems

Talal Riaz, Software Engineer, Yelp Inc.

Abstract:   At Yelp, we use ElasticSearch(ES) to power most of our Search. However, the process for updating a model or replacing it was slow and error-prone; engineers needed to spend time implementing any new feature transformations when training and serving, as well as ensuring parity between the two. We improve this situation by building an ES plugin that integrates neatly into Yelp’s Model Platform. Spark Pipelines trained and stored as MLeap bundles to MLFlow using the Model Platform can now be deployed directly to ES as MLeap pipelines. These Spark/MLeap pipelines encapsulate not only the ML model itself but also it’s feature engineering pipeline. Subsequently, this allows the ES plugin to swap one modeling pipeline for another as long as the base features for their pipelines are available through the ES index!

In this talk, we will discuss this ES plugin, as well as lessons learnt in making model pipelines performant.

What you will learn:  The workshop will cover 4 distinct areas: Basics of a model life cycle: What makes up a model life cycle and how do you design one Governance: Developing a governance workflow that meets your company’s needs Monitoring: How to monitor models post-deployment in a flexible manner Remediation: Creating remediation workflows that track and accelerate time to resolution

Technical Level of the Talk: 4/5

MLSecOps and Shift-left your security gears in Model Lifecycle

Arun Prabhakar, Senior Consultant, Security Compass

Abstract:   Security has been an afterthought and has been a separate function in many projects but with the advent of DevSecOps, the security professionals could effectively collaborate with development, testing, and operations teams. This helps to build-in the required information assurance to continuously integrate and deliver secure solutions. Many machine learning solutions are implemented using DevOps methodology too. However, incorporating security functions and capabilities in the model lifecycle has not been done rigorously, exposing many security risks at a later time.

MLSecOps is a new and upcoming approach proposed by security professionals to inject security in the model lifecycle so as to address threats identified even before the model is into production. This includes but not limited to addressing security vulnerabilities, privacy issues, non-compliant to standards and mandatory regulations. During the session, participants will get to see the vital elements of MLSecOps and how that can be practiced, the concept of “shift-left security” in model building by incorporating security requirements in the early stage of lifecycle and the benefits that could be reaped, and finally the best practices of security from a practioners standpoint for an effective risk management in your machine learning projects.

What you will learn:  Data Scientists and Machine Learning Engineers will learn about MLSecOps, the concept of "shift-left security" in MLOps and securing the pipeline before the models are deployed. Importantly, understand the concept of codifying security as they build the machine learning models.

Technical Level of the Talk: 5/5

How Not to Let Your Data and Model Drift Away Silently

Chengyin Eng,  Data Science Consultant, Databricks

Abstract: Deploying machine learning models has become a relatively frictionless process. However, properly deploying a model with a robust testing and monitoring framework is a vastly more complex task. There is no one-size-fits-all solution when it comes to productionizing ML models, oftentimes requiring custom implementations utilizing multiple libraries and tools. There are, however, a set of core statistical tests and metrics one should have in place to detect phenomena such as data and concept drift to prevent models from becoming unknowingly stale and detrimental to the business. This talk introduces a suite of useful tests and open-source package options, such as MLflow, scipy, statsmodels, to test the model and data validity in production.

What You'll Learn: ML deployment is not the end -- it is where ML models start to materialize impact and value to the business and people. We need monitoring and testing to ensure ML models behave as expected.

Technical Level of the Talk: 2/5

Production Machine Learning Monitoring: Outliers, Drift, Explainers & Statistical Performance

 Alejandro Saucedo, Engineering Director, Seldon Technologies

Abstract: The lifecycle of a machine learning model only begins once it's in production. In this talk, we provide a practical deep dive on best practices, principles, patterns, and techniques around production monitoring of machine learning models. We will cover standard microservice monitoring techniques applied into deployed machine learning models, as well as more advanced paradigms to monitor machine learning models through concept drift, outlier detector and explainability. We'll dive into a hands-on example, where we will train an image classification machine learning model from scratch, deploy it as a microservice in Kubernetes, and introduce advanced monitoring components as architectural patterns with hands-on examples. These monitoring techniques will include AI Explainers, Outlier Detectors, Concept Drift detectors, and Adversarial Detectors. We will also be understanding high-level architectural patterns that abstract these complex and advanced monitoring techniques into infrastructural components that will enable scale, introducing the standardized interfaces required for us to enable monitoring across hundreds or thousands of heterogeneous machine learning models.

What You'll Learn: A practical deep dive on production monitoring architectures for machine learning at scale using real-time metrics, outlier detectors, drift detectors, metrics servers and explainers.

Technical Level of the Talk: 4/5

Machine Learning Optimizations and Strategy for Post-Covid Era

 Sharmistha Chatterjee, Senior Manager Data Sciences, Publicis Sapient

Abstract: The objective of this talk is to understand the threats of machine learning models in production after the start of the pandemic. The talk will encompass the ML risk mitigation techniques to deploy, monitor, and recalibrate ML models that are essential for industry domains specially supply chain, retail, travel and hospitality, automobiles - both of which are positively and negatively impacted by Covid. In this context, the talk illustrates a few of the industry use-cases with examples from Traditional Machine Learning as well as Reinforcement learning. As the talk highlights different strategic business actions, it also offers insights on the best ethical practices that need to be imbibed in MLOps. Through understanding of smart intelligent automated AI/ML techniques, the audience will be able to develop ethical automated models by addressing the technical and business challenges of the post covid world.

What You'll Learn: Create a long term sustainable business , that can rely on machine learning models in production and are able to suitably adapt , in any adverse situation be it technical or business

Technical Level of the Talk: 4/5

Breaking Down Scotiabanks New Global AI Platform

 Pooja Bhojwani, Senior Data Scientist and Min Li, Director IB AIML at Scotiabank

Abstract: Advanced customer analytics is one of the key success factors of Scotiabank's transformation to digital banking leader and insight-driven organization. Recently, Scotiabank launched a new Global AI platform that enables the automation of machine learning and the delivery of analytics assets with advanced technology such as airflow and Kubernetes. At International Banking, our team’s (AIML) main objective is focused on leveraging this platform for our analytics experiments, model training, and also for operationalizing them. With this modernized technology stack, the Bank's operations in Colombia, analytics application deployments increased by 5X in model iterations, 8X increase in processing frequency, and were 10X faster due to automated processes. In this talk, we will share our journey of machine learning product development life cycle and MLOps practices at Scotiabank.

What You'll Learn: Challenges running ML in production, based on our experience at thousands of enterprise customers, and features available in MLflow and its open-source integrations.

Technical Level of the Talk: 3/5

MLOps at Acerta - Automation of the Machine Learning Life Cycle for Manufacturing

Amit Jain & Harika Gaggara, Acerta Analytics, Director Machine Learning

Abstract: Acerta Analytics builds machine learning solutions for automotive manufactures that help increase assembly/production line efficiency, provide actionable insights and reduce operational costs.

What You'll Learn: Best practices for building end-to-end production-grade machine learning solutions.

Technical Level of the Talk: 5/5

Taking AIM at Racial and Language Bias in AI Models; How 10 Companies learned to Audit, Investigate and Mitigate Bias in AI Model

Shingai Manjengwa - Director of Professional Development, Vector Institute & Godwin Liu - Industrial Technology Advisor (ITA) / Industrial Research Assistance Program (IRAP Ontario), National Research Council Canada / Government of Canada

Abstract: While diversity, inclusion, representation remain areas of focus in the AI community, there is one other way we are uniquely positioned to reduce racism and bias in the world. Vector Institute developed a Bias in AI program to ‘Audit, Investigate & Mitigate’, take AIM at racial and language bias in AI models. We will share how we got 10 small to medium enterprises to rethink their models and product offerings to address bias. And, we will share how you can reduce bias in AI.

What You'll Learn: Audit, Investigate, and Mitigate bias in AI models.

Technical Level of the Talk: 4/5

What's Next for MLOps?

Tristan Spaulding, Senior Director, Product Management, Datarobot

Abstract: In this forward-leaning session we will discuss recent trends in MLOps and our predictions for the next 2-3 years. Hear what our experts think is in store for MLOps and production-grade AI as we move towards a more intelligent tomorrow.

What You'll Learn: The future for MLOps

Building Reusable and Scalable ML Services to Enable Rapid Development in our Health and Wellness Marketplace

Genna Gliner & Brandon Davis , Machine Learning Engineers, Mindbody

Abstract: We build machine learning products to support discovery and automation within the fitness health and wellness sector. Our products range from building recommender systems to enable our consumers to discover products from our customers within our fitness marketplace to applying natural language techniques to enable our customers to create automated marketing emails to delight their customers. In this talk, we will present our solution for training and deploying machine learning models into our production environment. We will talk about how our pipeline has evolved with open source tools like DBT, AirFlow, and MLFlow to address various pain points in building and scaling our data pipelines to support our machine learning solutions across the breadth of our wellness and beauty product ecosystem. Through this, we were able to reduce our product release time by 85%.

What You'll Learn: Strategies and open-source tools for building a centralized ML Platform that's easily scalable, reusable, and maintainable.

Technical Level of the Talk: 3/5

Multi-Armed Bandits in Production at Stitch Fix

Brian Amadio,  Data Platform Engineer, Stitch Fix

Abstract: Multi-armed bandits have become a popular method for online experimentation which can often out-perform traditional A/B tests. In this talk, Brian Amadio will explain the challenges to scaling multi-armed bandits, and how he solved them for the Stitch Fix experimentation platform. His solution allows Data Scientists to build and integrate sophisticated contextual bandit reward models and includes an entirely new method for efficient, deterministic Thompson sampling.

What You'll Learn: The audience will learn about the main challenges to scaling multi-armed bandits, and the methods I've successfully used to solve these challenges at Stitch Fix. This includes an architecture that allows Data Scientists to independently deploy and update bandit reward models, as well as a novel method for Thompson sampling which is deterministic, repeatable, and very fast. They'll also learn how they can use the library I wrote in order to put multi-armed bandits into production at their own organizations.

Technical Level of the Talk: 5/5

How to Succeed With Machine Learning at Scale - Fireside Chat

Sandhya Ramu, Vice President, Chief Technical Advisor to CTO - Microsoft & Fatima Kardar,  Sr. Director, Site Reliability Engineering, LinkedIn.

What You'll Learn: Learn how experts at Microsoft empower AI and machine learning at scale

Training Models at Scale on the Cloud with Grid.AI

William Falcon,  Founder & CEO, & Creator of PyTorch Lightning

Abstract: This session is about how to train models on the cloud without changing a single line of code. From model development to hyperparameter sweeps, Grid automates all the engineering, so you can focus on machine learning instead of getting bogged down in infrastructure bottlenecks. In this session, William will walk you through the general workflow of research, from creating a hypothesis on a model on jupyterlab or on SSH to training a large-scale network using multiple GPUs simultaneously without having to change a single line of code.

What You'll Learn: William is a seasoned researcher. Not only will you learn about how to train models at scale in the cloud faster but you will see it from the point of view of a researcher. In addition, you will have a unique opportunity to pick William’s brain about all things AI and deep learning while understanding why he founded Grid.

Technical Level of the Talk: 5/5

Taming the Long Tail of Industrial ML Applications

Savin Goyal,  Software Engineer,  Netflix

Abstract: Data Science usage at Netflix goes much beyond our eponymous recommendation systems. It touches almost all aspects of our business - from optimizing content delivery and informing buying decisions to fighting fraud. Our unique culture affords our data scientists extraordinary freedom of choice in ML tools and libraries, all of which results in an ever-expanding set of interesting problem statements and a diverse set of ML approaches to tackle them. Our data scientists, at the same time, are expected to build, deploy, and operate complex ML workloads autonomously without the need to be significantly experienced with systems or data engineering. In this talk, I will discuss some of the challenges involved in improving the development and deployment experience for ML workloads. I will focus on Metaflow, our ML platform, which offers useful abstractions for managing the model’s lifecycle end-to-end, and how a focus on human-centric design positively affects our data scientists' velocity.

What You'll Learn: How should people think about building/adopting ML platforms within their organizations

Technical Level of the Talk: 4/5

Iterative Development Workflows for Building AI Applications

Vincent Sunn Chen,  Founding Engineer, Leading ML Engineering, Snorkel AI

Abstract: Modern AI application development is changing — rather than focusing solely on models trained over static datasets, practitioners are thinking more holistically about their pipelines, with a renewed emphasis on the training data. In this talk, we describe key interfaces and patterns for iteratively building high-quality AI applications using the Snorkel framework. We discuss how guided error analysis tools help developers prioritize the highest impact next step for improving quality— whether it's correcting supervision or fine-tuning models. Finally, we'll outline how these development workflows support collaboration with subject matter experts, who leverage domain expertise to impact end-to-end application quality.

What You'll Learn: In the context of the Snorkel framework, we discuss specific analysis tools and workflows that AI/ML engineers can use to prioritize the actionable next step to improving end-to-end quality.

Technical Level of the Talk: 5/5

Diagnosing Failure Modes in Your ML Organization

Jason Sleight,  Group Technical Lead: ML Platform, and Daniel Yao, Director, Applied Machine Learning at Yelp

Abstract: Over the past decade Yelp has scaled our ML reliance from fringe usage by a handful of enthusiastic developers to a core competency leveraged by many teams of dedicated experts to deliver tens of millions of dollars in incremental revenue. In that time we’ve experimented with several organizational structures and ML processes and observed several categories of pitfalls. In this talk, we’ll discuss how to diagnose several of these pitfalls especially as related to ML project velocity, ML practitioners’ happiness & retention, and scaling ML adoption to broader parts of your business objectives.

What You'll Learn: How to recognize and act upon common failure modes in your ML organization and process structure. Failure modes will be illustrated by several case studies from Yelp’s ML journey.

Technical Level of the Talk: 2/5

PANAMA: In-network Aggregation for Shared Machine Learning Clusters

Nadeen Gebara, Ph.D. Student, Imperial College London

Abstract: We present PANAMA, a novel in-network aggregation framework for distributed machine learning (ML) training on shared clusters serving a variety of jobs. PANAMA comprises two key components: (i) a custom in-network hardware accelerator that can support floating-point gradient aggregation at line rate without compromising accuracy; and (ii) a lightweight load-balancing and congestion control protocol that exploits the unique communication patterns of ML data-parallel jobs to enable fair sharing of network resources across different jobs while ensuring high throughput for long-running jobs and low latency for short jobs and other latency-sensitive traffic. We evaluate the feasibility of PANAMA using an FPGA-based prototype with 10 Gbps transceivers and via large-scale simulations. Our large-scale simulations demonstrate that PANAMA decreases the average training time of large jobs by up to a factor of 1.34. More importantly, by drastically decreasing the load placed on the network by large data-parallel jobs, PANAMA provides significant benefits to non-aggregation flows, especially latency-sensitive short flows, as it reduces their 99%-tile completion time by up to 4.5x.

What You'll Learn:

  • Challenges to scaling distributed training in shared ML clusters
  • Limitations of previously proposed solutions for scaling distributed data-parallel training.
  • Co-designing the in-network aggregation logic, host software, and congestion control protocol not only benefits data-parallel ML jobs but provides even more benefits to non-aggregation latency-sensitive short flows.

Technical Level of the Talk: 5/5

Scaling AI in Production with PyTorch

Geeta Chauhan, AI/PyTorch Partner Engineering Head, Facebook AI

Abstract: Deploying AI models in production and scaling the ML services is still a big challenge. In this talk, we will cover details of how to deploy your AI models, best practices for the deployment scenarios, and techniques for performance optimization and scaling the ML services. Come join us to learn how you can jumpstart the journey of taking your PyTorch models from Research to production.

What You'll Learn: Scalability challenges and solutions for deploying PyTorch models in Production

Technical Level of the Talk: 5/5

Shopping Recommendations at Pinterest

Sai Xiao, Machine Learning Engineer, Pinterest

Abstract: Millions of people across the world come to Pinterest to find new ideas every day. Shopping is at the core of Pinterest’s mission to help people create a life they love. This talk will introduce how Pinterest shopping team build related products recommendations systems, including engagement-based and embedding based candidates generations, indexing and serving method to support different multiple types of recommenders, and its deep neural network ranking models.

What You'll Learn:

1. The details about how Pinterest built shopping recommendations.

2. Engagement and embedding based candidate generators.

3. Indexing and serving method to support filter-based retrieval.

4. multi-head deep neural network ranking in shopping.

Technical Level of the Talk: 5/5

Scaling AI in Production with PyTorch

Kenny Daniel,Co-founder and Chief Technology Officer, Algorithmia

Abstract: After models are ready for production, the next challenge is building the infrastructure to operationalize them at scale. With the growing number of models, users and the maturity of ML lifecycle processes, things become even more challenging. Serving immutable versions of models via CI/CD, seamlessly rolling out new versions, handling pre/post processing of model input/output, and processing inference loads without scaling up infinitely constitute the basics. Efficiently utilizing the compute instances to keep costs down, while having observability for troubleshooting and performance are also the essentials of a production-grade MLOps platform. This talk is a speedrun of building such an MLOps system from scratch by following a realistic story, starting from a model in a Jupyter Notebook and gradually stepping up the needs to serve it in production.

What You'll Learn: What it means to build a stable, observable and cost-efficiently scalable MLOps platform from scratch, and the often underestimated challenges that arise as needs gradually increase.

Unlock the Power of MLFlow with a Multi-Cloud Compute Infrastructure

Jitendra Pandey, Co-Founder & CTO, Infinstor Inc

Abstract: MLFlow is rapidly gaining popularity in the data science community. ML training and deployment at scale requires a computing infrastructure that is closely integrated with MLFlow service for effective tracking, management, and monitoring of runs. In this talk, we present our experience with MLFlow service in a multi-cloud setup, outline how we integrated our compute infrastructure with MLFlow, and highlight the architectural considerations for a truly multi-cloud compute engine.

What You'll Learn: The audience will learn about a multi-cloud MLFlow service and how to integrate the compute infrastructure with MLFlow service that makes tracking and management of ml runs at a large scale effective.

Technical Level of the Talk: 5/5

We Protect, They Attack: Adversarial ML Ops with Elastic Security

Jessica David,  Senior Data Engineer, Elastic

Abstract: Protecting the world’s data from attack isn’t easy, especially with an ever-changing threat landscape. From malware to ransomware to unknown attack vectors, staying one step ahead of adversaries can be challenging. With Elastic Security, we use machine learning techniques to create top-tier protection software that detects & prevents threats on endpoints. But how can we keep users protected in a timely manner? In this talk, discover how we can get efficiently from model to endpoint with our various data pipelines and operational workflows.

What You'll Learn: The audience will learn how we train models on a monthly basis and can get them out to customers with just a few clicks, which allows us to stay on top of new security threats. They will also learn about the design choices we made to mitigate common ML security problems.

Technical Level of the Talk: 5/5

Share with friends

Date and time


Online event

Refund policy

Refunds up to 30 days before event

{ _('Organizer Image')}

Organizer MLOps World

Organizer of MLOps World; Machine Learning in Production 2021

Save This Event

Event Saved