this
5 Steps to Speed Up Your Data-Analysis on a Single Core
Jonathan Striebel
Data Engineering, Performance

Your data analysis pipeline works. Nice. Could it be faster? Probably. Do you need to parallelize? Not yet. Discover optimization steps that boost the performance of your data analysis pipeline on a single core, reducing time & costs.

Battle of Pipelines - who will win python orchestration in 2022?
Jannis Grönberg
Architecture, Data Engineering, DevOps

You struggle choosing the right #orchestration tool in #Python ? Join this #PyCon talk about when it's best to use #Kubeflow, #Airflow or #Prefect and learn how to automate your #data #pipelines and #ML workflows. #DataScience #dataengineering #DevOps #MLOps

Do I need to be Dr. Frankenstein to create real-ish synthetic data?
Gatha
Data Engineering, Ethics (Privacy, Fairness,… ), Governance

Synthetic data not only address the privacy needs but also offer workaround for unprecedented situations. This talk introduces their different types, the options for their generation, and how you don't need to be a mad scientist to make realistic synthetic data

Efficient data labelling with weak supervision
Maria Mestre
Data Engineering, Data Visualization, Natural Language Processing

Data labelling should not be a waterfall task. Label your data significantly faster with weak supervision (https://github.com/dataqa/dataqa)

Flexible ML Experiment Tracking System for Python Coders with DVC and Streamlit
Antoine Toubhans
Best Practice, Computer Vision, Data Engineering, Data Visualization, Development Methods, Reproducibility

Flexible ML Experiment Tracking System for Python Coders with DVC and Streamlit

Introducing the Dask Active Memory Manager
Guido Imperiale
Algorithms, Architecture, Backend, Cloud, Data Engineering, Distributed Computing, Parallel Programming / Async

The Active Memory Manager is a new experimental feature of Dask which aims to reduce the memory footprint of the cluster, prevent hard to debug out-of-memory issues, and make worker retirement more robust.

Making Machine Learning Applications Fast and Simple with ONNX
Jan-Benedikt Jagusch, Christian Bourjau
Data Engineering, DevOps, Packaging

In this session, you will learn how to use ONNX for your machine learning model deployments, which can reduce your single-row inference time by up to 99% while also drastically simplifying your model management.

Processing Open Street Map Data with Python and PostgreSQL
Travis Hathaway
Data Engineering, Databases, GIS / Geo-Analytics

Open Street Map is a large, community supported data set covering the entire world. Learn how to process this data with Python and PostgreSQL as I walk you through creating projects of your own. Along the way, we learn how OSM data is structured, and how you can use it yourself.

Secure ML: Automated Security Best Practices in Machine Learning
Alejandro Saucedo
Best Practice, Data Engineering, Security

As data science capabilities scale, the core concept of security becomes growingly critical - in this talk we provide an overview of challenges, solutions and best practices to introduce security into the ML lifecycle.

The state of DevOps for Python projects
Tobias Heintz
Data Engineering, Development Methods, DevOps

How alcemy uses DevOps techniques to streamline and accelerate our daily development. Let's look at a number of real-world examples and best practices taken straight from the pipelines we use to release code several times a day.

Using a database in a data science project - Lessons learned in production
Jacopo Farina
Data Engineering, Databases

Lessons learned in 4 years using Postgres in a machine learning project

What are data unit tests and why we need them
Theodore Meynard
Best Practice, Data Engineering

This talk will introduce the concept of data unit tests and why they are important in the workflow of data scientists when building data products.

Filter