Data Engineering Session List
5 Steps to Speed Up Your Data-Analysis on a Single Core
Jonathan Striebel
Data Engineering, PerformanceYour data analysis pipeline works. Nice. Could it be faster? Probably. Do you need to parallelize? Not yet. Discover optimization steps that boost the performance of your data analysis pipeline on a single core, reducing time & costs.
Battle of Pipelines - who will win python orchestration in 2022?
Jannis Grönberg
Architecture, Data Engineering, DevOpsYou struggle choosing the right #orchestration tool in #Python ? Join this #PyCon talk about when it's best to use #Kubeflow, #Airflow or #Prefect and learn how to automate your #data #pipelines and #ML workflows. #DataScience #dataengineering #DevOps #MLOps
Do I need to be Dr. Frankenstein to create real-ish synthetic data?
Gatha
Data Engineering, Ethics (Privacy, Fairness,… ), GovernanceSynthetic data not only address the privacy needs but also offer workaround for unprecedented situations. This talk introduces their different types, the options for their generation, and how you don't need to be a mad scientist to make realistic synthetic data
Efficient data labelling with weak supervision
Maria Mestre
Data Engineering, Data Visualization, Natural Language ProcessingData labelling should not be a waterfall task. Label your data significantly faster with weak supervision (https://github.com/dataqa/dataqa)
Flexible ML Experiment Tracking System for Python Coders with DVC and Streamlit
Antoine Toubhans
Best Practice, Computer Vision, Data Engineering, Data Visualization, Development Methods, ReproducibilityFlexible ML Experiment Tracking System for Python Coders with DVC and Streamlit
Introducing the Dask Active Memory Manager
Guido Imperiale
Algorithms, Architecture, Backend, Cloud, Data Engineering, Distributed Computing, Parallel Programming / AsyncThe Active Memory Manager is a new experimental feature of Dask which aims to reduce the memory footprint of the cluster, prevent hard to debug out-of-memory issues, and make worker retirement more robust.
Making Machine Learning Applications Fast and Simple with ONNX
Jan-Benedikt Jagusch, Christian Bourjau
Data Engineering, DevOps, PackagingIn this session, you will learn how to use ONNX for your machine learning model deployments, which can reduce your single-row inference time by up to 99% while also drastically simplifying your model management.
Processing Open Street Map Data with Python and PostgreSQL
Travis Hathaway
Data Engineering, Databases, GIS / Geo-AnalyticsOpen Street Map is a large, community supported data set covering the entire world. Learn how to process this data with Python and PostgreSQL as I walk you through creating projects of your own. Along the way, we learn how OSM data is structured, and how you can use it yourself.
Secure ML: Automated Security Best Practices in Machine Learning
Alejandro Saucedo
Best Practice, Data Engineering, SecurityAs data science capabilities scale, the core concept of security becomes growingly critical - in this talk we provide an overview of challenges, solutions and best practices to introduce security into the ML lifecycle.
The state of DevOps for Python projects
Tobias Heintz
Data Engineering, Development Methods, DevOpsHow alcemy uses DevOps techniques to streamline and accelerate our daily development. Let's look at a number of real-world examples and best practices taken straight from the pipelines we use to release code several times a day.
Using a database in a data science project - Lessons learned in production
Jacopo Farina
Data Engineering, DatabasesLessons learned in 4 years using Postgres in a machine learning project
What are data unit tests and why we need them
Theodore Meynard
Best Practice, Data EngineeringThis talk will introduce the concept of data unit tests and why they are important in the workflow of data scientists when building data products.
Filter