This tutorial is aimed at an audience from a broad background from beginner to advanced users, scientists, developers, and data scientists. The tutorial will provide the audience with an introduction to reproducibility, why it is important and how can we ensure reproducibility. We will be interacting with a widely used tool in the machine learning community called OpenML for this(https://new.openml.org/) and use Openml-python API to make this tutorial completely pythonic.
This talk is meant for students, developers and data scientist with basic knowledge of python.
Students will learn:
- How to create a machine learning workflow
- What are the components of a reproducible workflow
- How to create benchmarks and run models on benchmarks
It will help developers to have a better understanding of scientific concepts and data scientists to design experiments in a more responsible way
- Introduction to reproducibility and Open Science
- Introduction to OpenML(30 minutes)
- Introduction to datasets
- How datasets interact with tasks
- How models are stored in a flow definition
- Hands-on reproducibility with openml-python (30 minutes)
- Using an OpenML dataset
- Making a task for the OpenML Dataset
- Running a flow on the task
- Uploading the flow result as run to OpenML via openml-python
- Benchmarks and studies (30 minutes)
- Usage of OpenML Benchmarks in science
- Using multiple datasets and tasks to create a benchmark
- Applying multiple models on this benchmark and ensuring reproducibility.
- Creating your own scientific study
I'll be providing participants with colab notebooks, slides, and documentation, and support for hands-on sessions. The documentation for the tools available is here:
More resources regarding OpenML
- OpenML paper: https://arxiv.org/pdf/1407.7722.pdf
- OpenML Benchmarks paper: https://datasets-benchmarks-proceedings.neurips.cc/paper/2021/file/c7e1249ffc03eb9ded908c236bd1996d-Paper-round2.pdf
- OpenML Python paper: https://www.jmlr.org/papers/volume22/19-920/19-920.pdf
Affiliation: TU Eindhoven/OpenML
I am a research engineer for TU Eindhoven and OpenML. My goal is to make machine learning reproducible again.
visit the speaker at: Github