5 Steps to Speed Up Your Data-Analysis on a Single Core
Jonathan Striebel
Your data analysis pipeline works. Nice.
Could it be faster? Probably.
Do you need to parallelize? Not yet.
We'll go through optimization steps that boost the performance of your data analysis pipeline on a single core, reducing time & costs. This walkthrough shows tools and strategies to identify and mitigate bottlenecks, and demonstrate them in an example. The 5 steps cover:
- Identifying bottlenecks: Profiling
- Efficient IO
- Memory & Precision Tradeoffs
- Vectorization
- Jit-ing with numba
This talk is suited for data scientists on a beginner and intermediate level, typically working with a numpy/scipy/… stack or similar. The talk gives strategies & concrete suggestions how to speed up an existing analysis pipeline, which is demonstrated practically on an example, showing the gained speed improvements of each step.
The code and slides from the presentation can be found at https://github.com/jstriebel/data-analysis-speedup.
Jonathan Striebel
Affiliation: scalable minds GmbH
Jonathan is a software engineer and consultant at scalable minds. He works on machine-learning pipelines for biological and medical image analysis, ensuring scalability and maintainability. In Berlin he's waiting for the return of his social life, looking forward to playing music with his friends.