5 Steps to Speed Up Your Data-Analysis on a Single Core

Jonathan Striebel

Wednesday 14:00 in B07-B08 wednesday wednesday-14-00

Type/Track Talk pydata-pydata-scientific-libraries-stack

Your data analysis pipeline works. Nice.
Could it be faster? Probably.
Do you need to parallelize? Not yet.

We'll go through optimization steps that boost the performance of your data analysis pipeline on a single core, reducing time & costs. This walkthrough shows tools and strategies to identify and mitigate bottlenecks, and demonstrate them in an example. The 5 steps cover:

Identifying bottlenecks: Profiling
Efficient IO
Memory & Precision Tradeoffs
Vectorization
Jit-ing with numba

This talk is suited for data scientists on a beginner and intermediate level, typically working with a numpy/scipy/… stack or similar. The talk gives strategies & concrete suggestions how to speed up an existing analysis pipeline, which is demonstrated practically on an example, showing the gained speed improvements of each step.

The code and slides from the presentation can be found at https://github.com/jstriebel/data-analysis-speedup.

Tags Data Engineering Performance

Level Domain Expertise some Python Skill Level some

Jonathan Striebel

Affiliation: scalable minds GmbH

Jonathan is a software engineer and consultant at scalable minds. He works on machine-learning pipelines for biological and medical image analysis, ensuring scalability and maintainability. In Berlin he's waiting for the return of his social life, looking forward to playing music with his friends.

visit the speaker at: Github • Homepage