Data Science at Scale with Dask
Richard Pelgrim
An introduction to distributed computing:
- When, why and how should you leverage distributed computing?
- Introduction to Dask, an OSS Python library for distributed computing
How to parallelise your Python code with Dask:
- Why parallelise your code?
- Using dask.delayed() to parallelise custom code
Scaling your NumPy and pandas workflows:
- How to scale your NumPy and pandas to larger-than-memory datasets?
- Dask Collections: Bags, Arrays and DataFrames
Distributed Machine Learning with Dask:
- How to build distributed ML models
- Bursting to the cloud to transcend local compute resources
Richard Pelgrim
Affiliation: Coiled
Richard Pelgrim is a data scientist with a passion for communicating technical content in creative and compelling ways that increase engagement. Currently he does so as Developer Advocate at Coiled.io, the leading company built around the open-source Dask library for distributed computing in Python. Richard is regularly invited to give Dask tutorials at meet-ups and conferences and has a treasure chest of expert tips to support anyone looking to take their distributed computing to the next level.