Reproducible machine learning and science with python

Prabhant Singh

Tuesday 13:00 in A05-A06 tuesday tuesday-13-00

Type/Track Tutorial pydata-pydata-scientific-libraries-stack

This tutorial is aimed at an audience from a broad background from beginner to advanced users, scientists, developers, and data scientists. The tutorial will provide the audience with an introduction to reproducibility, why it is important and how can we ensure reproducibility. We will be interacting with a widely used tool in the machine learning community called OpenML for this(https://new.openml.org/) and use Openml-python API to make this tutorial completely pythonic.

This talk is meant for students, developers and data scientist with basic knowledge of python.

Students will learn:

How to create a machine learning workflow
What are the components of a reproducible workflow
How to create benchmarks and run models on benchmarks

It will help developers to have a better understanding of scientific concepts and data scientists to design experiments in a more responsible way

Outline:

Introduction to reproducibility and Open Science
Introduction to OpenML(30 minutes)
- Introduction to datasets
- How datasets interact with tasks
- How models are stored in a flow definition
Hands-on reproducibility with openml-python (30 minutes)
- Using an OpenML dataset
- Making a task for the OpenML Dataset
- Running a flow on the task
- Uploading the flow result as run to OpenML via openml-python
Benchmarks and studies (30 minutes)
- Usage of OpenML Benchmarks in science
- Using multiple datasets and tasks to create a benchmark
- Applying multiple models on this benchmark and ensuring reproducibility.
- Creating your own scientific study

I'll be providing participants with colab notebooks, slides, and documentation, and support for hands-on sessions. The documentation for the tools available is here:

More resources regarding OpenML

OpenML paper: https://arxiv.org/pdf/1407.7722.pdf
OpenML Benchmarks paper: https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/file/c7e1249ffc03eb9ded908c236bd1996d-Paper-round2.pdf
OpenML Python paper: https://www.jmlr.org/papers/volume22/19-920/19-920.pdf

Tags Best Practice Community Science

Level Domain Expertise some Python Skill Level some

Prabhant Singh

Affiliation: TU Eindhoven/OpenML

I am a research engineer for TU Eindhoven and OpenML. My goal is to make machine learning reproducible again.

visit the speaker at: Github