this /static/media/twitter/VYS8XY.png

Your data analysis pipeline works. Nice.
Could it be faster? Probably.
Do you need to parallelize? Not yet.

We'll go through optimization steps that boost the performance of your data analysis pipeline on a single core, reducing time & costs. This walkthrough shows tools and strategies to identify and mitigate bottlenecks, and demonstrate them in an example. The 5 steps cover:

  • Identifying bottlenecks: Profiling
  • Efficient IO
  • Memory & Precision Tradeoffs
  • Vectorization
  • Jit-ing with numba

This talk is suited for data scientists on a beginner and intermediate level, typically working with a numpy/scipy/… stack or similar. The talk gives strategies & concrete suggestions how to speed up an existing analysis pipeline, which is demonstrated practically on an example, showing the gained speed improvements of each step.

The code and slides from the presentation can be found at https://github.com/jstriebel/data-analysis-speedup.

Jonathan Striebel

Affiliation: scalable minds GmbH

Jonathan is a software engineer and consultant at scalable minds. He works on machine-learning pipelines for biological and medical image analysis, ensuring scalability and maintainability. In Berlin he's waiting for the return of his social life, looking forward to playing music with his friends.

visit the speaker at: GithubHomepage