Jonathan StriebelData Engineering, Performance
Your data analysis pipeline works. Nice. Could it be faster? Probably. Do you need to parallelize? Not yet. Discover optimization steps that boost the performance of your data analysis pipeline on a single core, reducing time & costs.
Jannis GrönbergArchitecture, Data Engineering, DevOps
You struggle choosing the right #orchestration tool in #Python ? Join this #PyCon talk about when it's best to use #Kubeflow, #Airflow or #Prefect and learn how to automate your #data #pipelines and #ML workflows. #DataScience #dataengineering #DevOps #MLOps
GathaData Engineering, Ethics (Privacy, Fairness,… ), Governance
Synthetic data not only address the privacy needs but also offer workaround for unprecedented situations. This talk introduces their different types, the options for their generation, and how you don't need to be a mad scientist to make realistic synthetic data
Maria MestreData Engineering, Data Visualization, Natural Language Processing
Data labelling should not be a waterfall task. Label your data significantly faster with weak supervision (https://github.com/dataqa/dataqa)
Antoine ToubhansBest Practice, Computer Vision, Data Engineering, Data Visualization, Development Methods, Reproducibility
Flexible ML Experiment Tracking System for Python Coders with DVC and Streamlit
Guido ImperialeAlgorithms, Architecture, Backend, Cloud, Data Engineering, Distributed Computing, Parallel Programming / Async
The Active Memory Manager is a new experimental feature of Dask which aims to reduce the memory footprint of the cluster, prevent hard to debug out-of-memory issues, and make worker retirement more robust.
Jan-Benedikt Jagusch, Christian BourjauData Engineering, DevOps, Packaging
In this session, you will learn how to use ONNX for your machine learning model deployments, which can reduce your single-row inference time by up to 99% while also drastically simplifying your model management.
Travis HathawayData Engineering, Databases, GIS / Geo-Analytics
Open Street Map is a large, community supported data set covering the entire world. Learn how to process this data with Python and PostgreSQL as I walk you through creating projects of your own. Along the way, we learn how OSM data is structured, and how you can use it yourself.
Alejandro SaucedoBest Practice, Data Engineering, Security
As data science capabilities scale, the core concept of security becomes growingly critical - in this talk we provide an overview of challenges, solutions and best practices to introduce security into the ML lifecycle.
Tobias HeintzData Engineering, Development Methods, DevOps
How alcemy uses DevOps techniques to streamline and accelerate our daily development. Let's look at a number of real-world examples and best practices taken straight from the pipelines we use to release code several times a day.
Jacopo FarinaData Engineering, Databases
Lessons learned in 4 years using Postgres in a machine learning project
Theodore MeynardBest Practice, Data Engineering
This talk will introduce the concept of data unit tests and why they are important in the workflow of data scientists when building data products.