PyData | PyConDE & PyData Berlin 2022

this

Tutorial pydata-pydata-scientific-libraries-stack

(Serious) Time for Time Series

Marysia Winkels, James Hayward

Time Series

From inventory to website visitors, resource planning to financial data, time-series data is all around us. Knowing what comes next is key to success in this dynamically changing world. So join us and learn about time series analysis and seasonality modelling.

Talk pydata-pydata-scientific-libraries-stack

5 Steps to Speed Up Your Data-Analysis on a Single Core

Jonathan Striebel

Data Engineering, Performance

Your data analysis pipeline works. Nice. Could it be faster? Probably. Do you need to parallelize? Not yet. Discover optimization steps that boost the performance of your data analysis pipeline on a single core, reducing time & costs.

Talk pydata-machine-learning-stats

`python-m5p` - M5 Prime regression trees in python, compliant with scikit-learn

Sylvain Marié

Algorithms, Predictive Modelling, Science

`python-m5p` is an implementation of the M5P algorithm compliant with scikit-learn.

Talk pydata-computer-vision

Building a Sign-to-Speech prototype with TensorFlow, Pytorch and DeepStack: How it happened & What I learned

Steven Kolawole

Computer Vision, Neural Networks / Deep Learning

Building an E2E working prototype that detects sign language meanings in images/videos and generate equivalent voice of words communicated by the sign language, in real-time, won't be completed in a day's work. Here I'd explain how it happened and what I learned in the process.

Talk pydata-computer-vision

Can you Read This? (Or: how I Improved Text Readability on the Web for the Visually Impaired)

Asya Frumkin

Algorithms, Computer Vision, Neural Networks / Deep Learning

I will explain my approach of detecting texts on top of an image background that are unreadable to people with visual impairment. I will explain the challenges I. encountered when using different OCR architectures for this task and talk about the solution I came up with.

Talk pydata-computer-vision

Challenge Accepted - How to Escape the Quicksand While Engineering a Computer Vision Application

Bettina Heinlein

Computer Vision

Leveraging problem-solving strategies for challenges in building Computer Vision applications and beyond, illustrated with a recent Computer Vision project.

Talk pydata-pydata-scientific-libraries-stack

Creating 3D Maps using Python

Martin Christen

GIS / Geo-Analytics

Create 3DMaps anywhere on the planet using Python and OpenData

Talk pydata-pydata-scientific-libraries-stack

Data Apis: Standardization of N-dimensional arrays and dataframes

Stephannie Jimenez Gacha

APIs

Introduction to the consortium of Data APIs, where we will be presenting our motivation, objectives and progress of the standardization process after one year of activity.

Tutorial pydata-pydata-scientific-libraries-stack

Data Science at Scale with Dask

Richard Pelgrim

APIs, Big Data, Cloud

A hands-on introduction to methods for scaling your data science and machine learning with Dask.

Talk pydata-natural-language-processing

deepdoctection - An open source package for document intelligence

Janis Meyer

Computer Vision, Natural Language Processing

deepdoctection is a Python package that enables document analysis pipelines to be built using deep learning models.

Talk pydata-machine-learning-stats

Detecting drift: how to evaluate and explore data drift in machine learning systems

Emeli Dral

Best Practice, Data Visualization, Statistics

When ML model is in production, you might encounter data and prediction drift. How exactly to detect and evaluate it? I'll share in this talk.

Tutorial pydata-visualization

Easily build interactive plots and apps with hvPlot

Philipp Rudiger, Maxime Liquet

Data Visualization, Jupyter, Science

Do you use the .plot() API of pandas or xarray? Do you ever wish it was easier to try out different combinations of the parameters in your data-processing pipeline? Follow this tutorial to learn how to easily build interactive plots and apps with hvPlot.

Talk pydata-pydata-scientific-libraries-stack

Easy and flexible imaging with the Core Imaging Library

Vaggelis Papoutsellis, Dr. Jakob Sauer Jørgensen

Algorithms, Big Data, Math

Core Imaging Library is an open-source, object-oriented Python library for inverse problems in imaging developed by the UK academic network CCPi.

Talk pydata-natural-language-processing

Efficient data labelling with weak supervision

Maria Mestre

Data Engineering, Data Visualization, Natural Language Processing

Data labelling should not be a waterfall task. Label your data significantly faster with weak supervision (https://github.com/dataqa/dataqa)

Talk pydata-deep-learning

Financial Portfolio Management with Deep Reinforcement Learning

T-Berger

Neural Networks / Deep Learning, Simulation, Time Series

intelligent_portfolio_optimization_with_deep_reinforcement_learning

Talk pydata-pydata-scientific-libraries-stack

Flexible ML Experiment Tracking System for Python Coders with DVC and Streamlit

Antoine Toubhans

Best Practice, Computer Vision, Data Engineering, Data Visualization, Development Methods, Reproducibility

Flexible ML Experiment Tracking System for Python Coders with DVC and Streamlit

Talk pydata-data-handling

Fundamentals of relational databases

Katharina Rasch

Databases

Somewhat comfortable with using SQL to access data, but curious to know what happens behind the scenes when you send off your query?

Talk pydata-computer-vision

Grokking LIME: How can we explain why an image classifier "knows" what’s in a photo without looking inside the model?

Kilian Kluge

Computer Vision, Neural Networks / Deep Learning, Transparency / Interpretability

How can LIME explain machine-learning models without peeking inside? Let's find out!

Talk pydata-machine-learning-stats

Honey, I shrunk the target variable! Common pitfalls when transforming the target variable and how to exploit transformations.

Florian Wilhelm

Math, Predictive Modelling, Statistics

Honey, I shrunk the target variable! Common pitfalls when transforming the target variable and how to exploit transformations.

Sponsored pydata-visualization

How a simple streamlit dashboard will help to put your machine learning model in production

Daniël Willemsen, Welmoet Verbaan

Best Practice, Data Visualization, Predictive Modelling

Have you struggled getting your valuable machine learning model into the hands of users? A simple streamlit monitoring dashboard can help!

Talk pydata-deep-learning

How to Trust Your Deep Learning Code

Tilman Krokotsch

Best Practice, Neural Networks / Deep Learning

Write unit tests and learn to trust your Deep Learning code again.

Tutorial pydata-machine-learning-stats

Inpsect and try to interpret your scikit-learn machine-learning models

Guillaume Lemaitre

Predictive Modelling, Statistics, Transparency / Interpretability

Inspect and try to interpret your scikit-learn machine-learning models

Talk pydata-pydata-scientific-libraries-stack

Introducing the Dask Active Memory Manager

Guido Imperiale

Algorithms, Architecture, Backend, Cloud, Data Engineering, Distributed Computing, Parallel Programming / Async

The Active Memory Manager is a new experimental feature of Dask which aims to reduce the memory footprint of the cluster, prevent hard to debug out-of-memory issues, and make worker retirement more robust.

Talk pydata-machine-learning-stats

Introduction to Uplift Modeling

Dr. Juan Orduz

Algorithms, Predictive Modelling, Statistics

In this talk we introduce uplift modelling, a method to estimate conditional average treatment effects (CATE) using machine learning estimators.

Talk pydata-jupyter

JupyterLite: Jupyter ❤️ WebAssembly ❤️ Python

Jeremy Tuloup

Jupyter, Reproducibility, Use Case

JupyterLite is a Jupyter distribution that runs entirely in the web browser, backed by in-browser language kernels such as the WebAssembly powered Pyodide kernel. JupyterLite enables data science and interactive computing with the PyData scientific stack, directly in the browser.

Talk pydata-machine-learning-stats

Machine Learning Testing Ecosystem of Python

Yunus Emrah Bulut

Computer Vision, Ethics (Privacy, Fairness,… ), Governance, Natural Language Processing, Neural Networks / Deep Learning, Security

Machine learning testing becomes an indispensable part of the MLOps and Python offers great ecosystem for this purpose.

Tutorial pydata-machine-learning-stats

Making MLOps uncool again

David

Best Practice, Development Methods, Reproducibility

In this workshop, we will learn what it means and how to build an "MLOps workflow" by extending the power of Git and GitHub with open-source tools.

Sponsored pydata-machine-learning-stats

My forecast is better than yours! What does that even mean?

Illia Babounikau

Statistics, Time Series

Established forecast evaluation procedures often turn out to be inappropriate and biased for modern time series forecasting. I will present the number of forecast evaluations issues and resolutions based on the real use cases of demand forecasting developed within BlueYonder.

Talk pydata-pydata-scientific-libraries-stack

On Blocks, Copies and Views: updating pandas' internals

Joris Van den Bossche

APIs, Data Structures

As a pandas user, did you ever run into the SettingWithCopyWarning? Quite likely, and this is one of the more confusing aspects of pandas. But it doesn’t have to be this way! Check my proposal to simplify this aspect of pandas

Sponsored pydata-deep-learning

Optimize your network inference time with OpenVINO

Adrian Boguszewski

Jupyter, Neural Networks / Deep Learning, Performance

Learn how to automatically convert the model using Model Optimizer and how to run the inference with OpenVINO Runtime to infer your model with low latency on the CPU and iGPU you already have. The magic with only a few lines of code.

Sponsored pydata-jupyter

Overcoming 5 Hurdles to Using Jupyter Notebooks for Data Science, by the JetBrains Datalore Team

Alena Guzharina

Data Visualization, Jupyter, Reproducibility

Overcoming 5 Hurdles to Using Jupyter Notebooks for Data Science, by the JetBrains @Datalore Team Join our talk to discuss setting up environments, working with data, writing code without IDE support, and sharing results, as well as collaboration and reproducibility.

Talk pydata-natural-language-processing

Performing Content: Can NLP and Deep Learning algorithms predict reader preferences?

Sebastian Cattes

Natural Language Processing, Neural Networks / Deep Learning, Statistics

Can AI understand what drives user engagement? Join our talk "Performing Content: Can NLP and Deep Learning algorithms predict reader preferences?" to find out what NLP can bring to the editorial table.

Tutorial pydata-data-handling

PPML: Machine Learning on Data you cannot see

Valerio Maggio

Neural Networks / Deep Learning, Security

Have you ever wondered how to train your @PyTorch model on private data you cannot see? If you want to know how, this is the workshop for you! #PPML cc/ @openminedorg

Tutorial pydata-deep-learning

Practical graph neural networks in Python with TensorFlow and Spektral

Aleksander Molak

Graphs, Neural Networks / Deep Learning

Practical Graph Neural Networks (GNNs) with Spektral & TensorFlow 🤩

Talk pydata-machine-learning-stats

Predictive Maintenance and Anomaly Detection for Wind Energy

Tobias Hoinka

Predictive Modelling, Statistics, Time Series

This talk will describe predictive modeling applications in wind turbine maintenance, the challenges of anomaly detection and ways to move to more automatic diagnoses by modeling past documented defects.

Talk pydata-data-handling

Processing Open Street Map Data with Python and PostgreSQL

Travis Hathaway

Data Engineering, Databases, GIS / Geo-Analytics

Open Street Map is a large, community supported data set covering the entire world. Learn how to process this data with Python and PostgreSQL as I walk you through creating projects of your own. Along the way, we learn how OSM data is structured, and how you can use it yourself.

Tutorial pydata-pydata-scientific-libraries-stack

Reproducible machine learning and science with python

Prabhant Singh

Best Practice, Community, Science

Learn how to create reproducible workflows, benchmarks and studies with openml-python

Talk pydata-visualization

Sankey Plots with Python

Daniel Ringler

Data Visualization, Jupyter, Python fundamentals

Sankey Plots in Python? Get an introduction on how and when to use them.

Talk pydata-machine-learning-stats

Secure ML: Automated Security Best Practices in Machine Learning

Alejandro Saucedo

Best Practice, Data Engineering, Security

As data science capabilities scale, the core concept of security becomes growingly critical - in this talk we provide an overview of challenges, solutions and best practices to introduce security into the ML lifecycle.

Sponsored pydata-visualization

Seeing the needle AND the haystack: single-datapoint selection for billion-point datasets

Jean-Luc Stevens

Big Data, Data Visualization, Jupyter

Building simple custom interactive web dashboards that display millions or billions of samples while giving access to each individual sample.

Tutorial pydata-pydata-scientific-libraries-stack

sktime - python toolbox for time series: advanced forecasting - probabilistic, global and hierarchical

Franz Kiraly

Algorithms, Predictive Modelling, Time Series

The forecasting module of sktime provides a unified, sklearn-compatible, and composable interface. This tutorial covers advanced topics in forecasting using sktime: probabilistic forecasting, and forecasting with panel data, including global/hierarchical forecasting.

Talk pydata-data-handling

Squirrel - Efficient Data Loading for Large-Scale Deep Learning

Dr. Thomas Wollmann

Distributed Computing, Neural Networks / Deep Learning, Parallel Programming / Async

Learn why we built and open sourced a data infrastructure library for deep learning.

Talk pydata-machine-learning-stats

The secret sauce of data science management

Shir Meir Lador

Best Practice, Big Data, Career & Freelancing, Corporate

In this talk, we will discuss lessons learned on how to build a DS team that prospers while addressing the unique challenges of leading a DS team.

Talk pydata-natural-language-processing

Transformer based clustering: Identifying product clusters for E-commerce

Sebastian Wanner, Christopher Lennan

Natural Language Processing, Neural Networks / Deep Learning, Use Case

Transformer based clustering with Sentence-Transformers and Facebook Faiss for an E-commerce use case where we clustered offers to automatically generate new products.

Sponsored pydata-machine-learning-stats

Unsupervised shallow learning for fraud detection on marketplaces

Andreu Mora

Algorithms, Best Practice, Predictive Modelling

Tune in to learn how @adyen uses ML and open source over python to combat fraud and wrongdoings over large marketplaces such as @gofundme or @eBay

Sponsored pydata-data-handling

Using a database in a data science project - Lessons learned in production

Jacopo Farina

Data Engineering, Databases

Lessons learned in 4 years using Postgres in a machine learning project

Talk pydata-data-handling

What are data unit tests and why we need them

Theodore Meynard

Best Practice, Data Engineering

This talk will introduce the concept of data unit tests and why they are important in the workflow of data scientists when building data products.

Talk pydata-natural-language-processing

XAI meets Natural Language Processing

Larissa Haas

Data Visualization, Ethics (Privacy, Fairness,… ), Transparency / Interpretability

XAI meets NLP - approaches, workarounds and lessons learned while making an NLP project explainable

Talk pydata-machine-learning-stats

You shall not share!

Gönül Aycı

Ethics (Privacy, Fairness,… ), Natural Language Processing

Are you ready to have an agent to help to preserve your privacy in online social networks? "You shall not share!" will be presented by @gonul_ayci ⚡️

Talk pydata-visualization

Your data, your insights: creating personal data projects to (re-)own the data you share

Paula Gonzalez Avalos

Data Visualization, Predictive Modelling

Your data, your insights: 3 examples to illustrate how we can apply common data science libraries together with data shared via mobile apps or collected manually to build little data visualization projects that provide unique, contextual and intmiate insights.

Filter