Introducing the Dask Active Memory Manager Guido Imperiale PyConDE & PyDataBerlin 2022 conference

Introducing the Dask Active Memory Manager

Guido Imperiale

Monday 15:00 in B05-B06 monday monday-15-00

Type/Track Talk pydata-pydata-scientific-libraries-stack

Historically, the Dask scheduler did not implement any particular logic to manage distributed data after it's been created. This can lead to imbalances in memory allocation throughout the cluster, excessive memory consumption, and counter-intuitive out-of-memory issues.

This talk introduces a new feature of Dask, the Active Memory Manager daemon, which aims to resolve all these long-standing issues by removing unnecessary replicas and moving around the rest to even out the memory load among workers. The same system also allows for more robust worker retirement, adaptive downscaling in the middle of a computation, and a redesign of the OOM worker pause.

Tags Algorithms Architecture Backend Cloud Data Engineering Distributed Computing Parallel Programming / Async

Level Domain Expertise expert Python Skill Level some

Guido Imperiale

Affiliation: Coiled

I worked 10 years on the technical infrastructure underlying Monte Carlo simulations for finance. I'm currently busy full time improving the dask and dask.distributed open source packages.

visit the speaker at: Github