Fast native data structures: C/C++ from Python Stefan Behnel PyConDE & PyDataBerlin 2022 conference

Fast native data structures: C/C++ from Python

Stefan Behnel

Wednesday 10:05 in B05-B06 wednesday wednesday-10-05

Type/Track Talk pycon-programming-software-engineering

Python has very fast and thoroughly optimised data structures: lists, dicts, sets and the collections module make it easy to write simple code that performs well. The flip-side shows when it comes to processing very large amounts of simple data, especially numbers or strings. For these, the object overhead of Python's objects is very large in comparison to the low-level languages C and C++ that benefit directly from bare metal CPU performance as well as from GIL-free multi-threading and parallel computation.

This talk will show how compiling your Python code with Cython (https://cython.org/) enables you to make direct use of fast and memory efficient native data types and data structures. Cython provides very efficient ways to access the internals of Python data structures, process data from NumPy arrays, and use data structures from native C libraries or the C++ STL standard library as replacements for the high-level Python collections.

You will learn how you can implement high-level Python interfaces that enable fast data processing underneath, without sacrificing the integration with regular Python features and libraries to allow for easy direct data manipulation from Python code.

The notebook used in the presentation is available at http://consulting.behnel.de/notebook/Fast_Native_Data_Structures_PyCon-DE_2022.ipynb

Tags Big Data Parallel Programming / Async Python - PyPy Cython Anaconda

Level Domain Expertise none Python Skill Level some

Stefan Behnel

Stefan is a long-time Python user and core developer of the well-known OSS projects Cython [1], lxml [2] and CPython [3]. He gives lectures and trainings on Python, Cython and High-Performance Computing topics.

[1] https://cython.org/ [2] https://lxml.de/ [3] https://python.org/