PyData Session List
(Serious) Time for Time Series
Marysia Winkels, James Hayward
Time SeriesFrom inventory to website visitors, resource planning to financial data, time-series data is all around us. Knowing what comes next is key to success in this dynamically changing world. So join us and learn about time series analysis and seasonality modelling.
5 Steps to Speed Up Your Data-Analysis on a Single Core
Jonathan Striebel
Data Engineering, PerformanceYour data analysis pipeline works. Nice. Could it be faster? Probably. Do you need to parallelize? Not yet. Discover optimization steps that boost the performance of your data analysis pipeline on a single core, reducing time & costs.
`python-m5p` - M5 Prime regression trees in python, compliant with scikit-learn
Sylvain Marié
Algorithms, Predictive Modelling, Science`python-m5p` is an implementation of the M5P algorithm compliant with scikit-learn.
Building a Sign-to-Speech prototype with TensorFlow, Pytorch and DeepStack: How it happened & What I learned
Steven Kolawole
Computer Vision, Neural Networks / Deep LearningBuilding an E2E working prototype that detects sign language meanings in images/videos and generate equivalent voice of words communicated by the sign language, in real-time, won't be completed in a day's work. Here I'd explain how it happened and what I learned in the process.
Can you Read This? (Or: how I Improved Text Readability on the Web for the Visually Impaired)
Asya Frumkin
Algorithms, Computer Vision, Neural Networks / Deep LearningI will explain my approach of detecting texts on top of an image background that are unreadable to people with visual impairment. I will explain the challenges I. encountered when using different OCR architectures for this task and talk about the solution I came up with.
Challenge Accepted - How to Escape the Quicksand While Engineering a Computer Vision Application
Bettina Heinlein
Computer VisionLeveraging problem-solving strategies for challenges in building Computer Vision applications and beyond, illustrated with a recent Computer Vision project.
Creating 3D Maps using Python
Martin Christen
GIS / Geo-AnalyticsCreate 3DMaps anywhere on the planet using Python and OpenData
Data Apis: Standardization of N-dimensional arrays and dataframes
Stephannie Jimenez Gacha
APIsIntroduction to the consortium of Data APIs, where we will be presenting our motivation, objectives and progress of the standardization process after one year of activity.
Data Science at Scale with Dask
Richard Pelgrim
APIs, Big Data, CloudA hands-on introduction to methods for scaling your data science and machine learning with Dask.
deepdoctection - An open source package for document intelligence
Janis Meyer
Computer Vision, Natural Language Processingdeepdoctection is a Python package that enables document analysis pipelines to be built using deep learning models.
Detecting drift: how to evaluate and explore data drift in machine learning systems
Emeli Dral
Best Practice, Data Visualization, StatisticsWhen ML model is in production, you might encounter data and prediction drift. How exactly to detect and evaluate it? I'll share in this talk.
Easily build interactive plots and apps with hvPlot
Philipp Rudiger, Maxime Liquet
Data Visualization, Jupyter, ScienceDo you use the .plot() API of pandas or xarray? Do you ever wish it was easier to try out different combinations of the parameters in your data-processing pipeline? Follow this tutorial to learn how to easily build interactive plots and apps with hvPlot.
Easy and flexible imaging with the Core Imaging Library
Vaggelis Papoutsellis, Dr. Jakob Sauer Jørgensen
Algorithms, Big Data, MathCore Imaging Library is an open-source, object-oriented Python library for inverse problems in imaging developed by the UK academic network CCPi.
Efficient data labelling with weak supervision
Maria Mestre
Data Engineering, Data Visualization, Natural Language ProcessingData labelling should not be a waterfall task. Label your data significantly faster with weak supervision (https://github.com/dataqa/dataqa)
Financial Portfolio Management with Deep Reinforcement Learning
T-Berger
Neural Networks / Deep Learning, Simulation, Time Seriesintelligent_portfolio_optimization_with_deep_reinforcement_learning
Flexible ML Experiment Tracking System for Python Coders with DVC and Streamlit
Antoine Toubhans
Best Practice, Computer Vision, Data Engineering, Data Visualization, Development Methods, ReproducibilityFlexible ML Experiment Tracking System for Python Coders with DVC and Streamlit
Fundamentals of relational databases
Katharina Rasch
DatabasesSomewhat comfortable with using SQL to access data, but curious to know what happens behind the scenes when you send off your query?
Grokking LIME: How can we explain why an image classifier "knows" what’s in a photo without looking inside the model?
Kilian Kluge
Computer Vision, Neural Networks / Deep Learning, Transparency / InterpretabilityHow can LIME explain machine-learning models without peeking inside? Let's find out!
Honey, I shrunk the target variable! Common pitfalls when transforming the target variable and how to exploit transformations.
Florian Wilhelm
Math, Predictive Modelling, StatisticsHoney, I shrunk the target variable! Common pitfalls when transforming the target variable and how to exploit transformations.
How a simple streamlit dashboard will help to put your machine learning model in production
Daniël Willemsen, Welmoet Verbaan
Best Practice, Data Visualization, Predictive ModellingHave you struggled getting your valuable machine learning model into the hands of users? A simple streamlit monitoring dashboard can help!
How to Trust Your Deep Learning Code
Tilman Krokotsch
Best Practice, Neural Networks / Deep LearningWrite unit tests and learn to trust your Deep Learning code again.
Inpsect and try to interpret your scikit-learn machine-learning models
Guillaume Lemaitre
Predictive Modelling, Statistics, Transparency / InterpretabilityInspect and try to interpret your scikit-learn machine-learning models
Introducing the Dask Active Memory Manager
Guido Imperiale
Algorithms, Architecture, Backend, Cloud, Data Engineering, Distributed Computing, Parallel Programming / AsyncThe Active Memory Manager is a new experimental feature of Dask which aims to reduce the memory footprint of the cluster, prevent hard to debug out-of-memory issues, and make worker retirement more robust.
Introduction to Uplift Modeling
Dr. Juan Orduz
Algorithms, Predictive Modelling, StatisticsIn this talk we introduce uplift modelling, a method to estimate conditional average treatment effects (CATE) using machine learning estimators.
JupyterLite: Jupyter ❤️ WebAssembly ❤️ Python
Jeremy Tuloup
Jupyter, Reproducibility, Use CaseJupyterLite is a Jupyter distribution that runs entirely in the web browser, backed by in-browser language kernels such as the WebAssembly powered Pyodide kernel. JupyterLite enables data science and interactive computing with the PyData scientific stack, directly in the browser.
Machine Learning Testing Ecosystem of Python
Yunus Emrah Bulut
Computer Vision, Ethics (Privacy, Fairness,… ), Governance, Natural Language Processing, Neural Networks / Deep Learning, SecurityMachine learning testing becomes an indispensable part of the MLOps and Python offers great ecosystem for this purpose.
Making MLOps uncool again
David
Best Practice, Development Methods, ReproducibilityIn this workshop, we will learn what it means and how to build an "MLOps workflow" by extending the power of Git and GitHub with open-source tools.
My forecast is better than yours! What does that even mean?
Illia Babounikau
Statistics, Time SeriesEstablished forecast evaluation procedures often turn out to be inappropriate and biased for modern time series forecasting. I will present the number of forecast evaluations issues and resolutions based on the real use cases of demand forecasting developed within BlueYonder.
On Blocks, Copies and Views: updating pandas' internals
Joris Van den Bossche
APIs, Data StructuresAs a pandas user, did you ever run into the SettingWithCopyWarning? Quite likely, and this is one of the more confusing aspects of pandas. But it doesn’t have to be this way! Check my proposal to simplify this aspect of pandas
Optimize your network inference time with OpenVINO
Adrian Boguszewski
Jupyter, Neural Networks / Deep Learning, PerformanceLearn how to automatically convert the model using Model Optimizer and how to run the inference with OpenVINO Runtime to infer your model with low latency on the CPU and iGPU you already have. The magic with only a few lines of code.
Overcoming 5 Hurdles to Using Jupyter Notebooks for Data Science, by the JetBrains Datalore Team
Alena Guzharina
Data Visualization, Jupyter, ReproducibilityOvercoming 5 Hurdles to Using Jupyter Notebooks for Data Science, by the JetBrains @Datalore Team Join our talk to discuss setting up environments, working with data, writing code without IDE support, and sharing results, as well as collaboration and reproducibility.
Performing Content: Can NLP and Deep Learning algorithms predict reader preferences?
Sebastian Cattes
Natural Language Processing, Neural Networks / Deep Learning, StatisticsCan AI understand what drives user engagement? Join our talk "Performing Content: Can NLP and Deep Learning algorithms predict reader preferences?" to find out what NLP can bring to the editorial table.
PPML: Machine Learning on Data you cannot see
Valerio Maggio
Neural Networks / Deep Learning, SecurityHave you ever wondered how to train your @PyTorch model on private data you cannot see? If you want to know how, this is the workshop for you! #PPML cc/ @openminedorg
Practical graph neural networks in Python with TensorFlow and Spektral
Aleksander Molak
Graphs, Neural Networks / Deep LearningPractical Graph Neural Networks (GNNs) with Spektral & TensorFlow 🤩
Predictive Maintenance and Anomaly Detection for Wind Energy
Tobias Hoinka
Predictive Modelling, Statistics, Time SeriesThis talk will describe predictive modeling applications in wind turbine maintenance, the challenges of anomaly detection and ways to move to more automatic diagnoses by modeling past documented defects.
Processing Open Street Map Data with Python and PostgreSQL
Travis Hathaway
Data Engineering, Databases, GIS / Geo-AnalyticsOpen Street Map is a large, community supported data set covering the entire world. Learn how to process this data with Python and PostgreSQL as I walk you through creating projects of your own. Along the way, we learn how OSM data is structured, and how you can use it yourself.
Reproducible machine learning and science with python
Prabhant Singh
Best Practice, Community, ScienceLearn how to create reproducible workflows, benchmarks and studies with openml-python
Sankey Plots with Python
Daniel Ringler
Data Visualization, Jupyter, Python fundamentalsSankey Plots in Python? Get an introduction on how and when to use them.
Secure ML: Automated Security Best Practices in Machine Learning
Alejandro Saucedo
Best Practice, Data Engineering, SecurityAs data science capabilities scale, the core concept of security becomes growingly critical - in this talk we provide an overview of challenges, solutions and best practices to introduce security into the ML lifecycle.
Seeing the needle AND the haystack: single-datapoint selection for billion-point datasets
Jean-Luc Stevens
Big Data, Data Visualization, JupyterBuilding simple custom interactive web dashboards that display millions or billions of samples while giving access to each individual sample.
sktime - python toolbox for time series: advanced forecasting - probabilistic, global and hierarchical
Franz Kiraly
Algorithms, Predictive Modelling, Time SeriesThe forecasting module of sktime provides a unified, sklearn-compatible, and composable interface. This tutorial covers advanced topics in forecasting using sktime: probabilistic forecasting, and forecasting with panel data, including global/hierarchical forecasting.
Squirrel - Efficient Data Loading for Large-Scale Deep Learning
Dr. Thomas Wollmann
Distributed Computing, Neural Networks / Deep Learning, Parallel Programming / AsyncLearn why we built and open sourced a data infrastructure library for deep learning.
The secret sauce of data science management
Shir Meir Lador
Best Practice, Big Data, Career & Freelancing, CorporateIn this talk, we will discuss lessons learned on how to build a DS team that prospers while addressing the unique challenges of leading a DS team.
Transformer based clustering: Identifying product clusters for E-commerce
Sebastian Wanner, Christopher Lennan
Natural Language Processing, Neural Networks / Deep Learning, Use CaseTransformer based clustering with Sentence-Transformers and Facebook Faiss for an E-commerce use case where we clustered offers to automatically generate new products.
Unsupervised shallow learning for fraud detection on marketplaces
Andreu Mora
Algorithms, Best Practice, Predictive ModellingTune in to learn how @adyen uses ML and open source over python to combat fraud and wrongdoings over large marketplaces such as @gofundme or @eBay
Using a database in a data science project - Lessons learned in production
Jacopo Farina
Data Engineering, DatabasesLessons learned in 4 years using Postgres in a machine learning project
What are data unit tests and why we need them
Theodore Meynard
Best Practice, Data EngineeringThis talk will introduce the concept of data unit tests and why they are important in the workflow of data scientists when building data products.
XAI meets Natural Language Processing
Larissa Haas
Data Visualization, Ethics (Privacy, Fairness,… ), Transparency / InterpretabilityXAI meets NLP - approaches, workarounds and lessons learned while making an NLP project explainable
You shall not share!
Gönül Aycı
Ethics (Privacy, Fairness,… ), Natural Language ProcessingAre you ready to have an agent to help to preserve your privacy in online social networks? "You shall not share!" will be presented by @gonul_ayci ⚡️
Your data, your insights: creating personal data projects to (re-)own the data you share
Paula Gonzalez Avalos
Data Visualization, Predictive ModellingYour data, your insights: 3 examples to illustrate how we can apply common data science libraries together with data shared via mobile apps or collected manually to build little data visualization projects that provide unique, contextual and intmiate insights.
Filter