$10 – $200

Massive data analysis with Spark [online, DAT202]

Actions and Detail Panel

$10 – $200

Event Information

Share this event

Date and time

Location

Location

Online event

Refund policy

Refund policy

No Refunds

Event description
Learn how to use Apache Spark with Python to analyze a large amount of data.

About this event

Apache Spark is one of the most important free software for data processing and analysis. This workshop will teach you how to use Apache Spark with Python (PySpark) to analyze data sets that are too large to be processed by a single computer.

With PySpark, you will learn how to import your data, how to use the functions to transform, reduce and compile your data, and how to produce parallel algorithms that can run on Calcul Quebec and Compute Canada clusters.

Registration

  • Academic participant : 10$
  • Non-academic participant : $200

Prerequisites

Good knowledge of the Unix command line (refer to the workshop UNX101 Unix command line) and how to write functions in Python.

Course plan

  • 1. Introduction to big data and Map-Reduce;
  • 2. Presentation of Apache Spark;
  • 3. Import data with PySpark;
  • 4. Sort data by key/value;
  • 5. Work with structured data (PySpark SQL);
  • 6. Develop parallel algorithms.

Instructor

Lucas Nogueira, analyst in advanced research computing at Calcul Québec.

Language

English

Technical prerequisites

We will use the Zoom platform. Because this event is a practical workshop, it is very useful having a secondary screen where you would get the instructor window on one screen and your own window on your main screen.

We will use the Jupyter Lab interface. Make sure you have a modern Web browser like Google Chrome, Firefox, Edge or Safari.

Contact

For any question, please write to training@calculquebec.ca.

Share with friends

Date and time

Location

Online event

Refund policy

No Refunds

{ _('Organizer Image')}

Organizer Calcul Québec

Organizer of Massive data analysis with Spark [online, DAT202]

Calcul Québec est un regroupement d’universités québécoises réunies autour du calcul informatique de pointe (CIP). Nous proposons des formations et midi-conférences sur différents sujets allant de l'initiation à la programmation, à l'analyse des données et la programmation parallèle. 

 

Partenaire régional de Calcul Canada, Calcul Québec bénéficie du soutien financier de la Fondation canadienne pour l’innovation, du Ministère de l’Économie et de l’Innovation et des Fonds de recherche du Québec.

Save This Event

Event Saved