Using a database in a data science project - Lessons learned in production

Jacopo Farina

Tuesday 10:50 in B09 tuesday tuesday-10-50

Type/Track Sponsored pydata-data-handling

Since four years we work on a machine learning project in production, using Postgres as a database. We are sharing the problems we encountered and suggest possible solutions: how to keep track of the database usage from different components from a large codebase and detect bottlenecks, how to systematically profile queries duration and reduce downtime when the database is upgraded. We'll see a few simple ways to handle schema changes when ingesting data from outside and caching using files.

Tags Data Engineering Databases

Level Domain Expertise some Python Skill Level none

Jacopo Farina

Affiliation: Flixbus

I'm a developer from Milan, Italy, living in Germany and working as a Data Engineer in Flixbus since 2018. My team applies machine learning to the problem of predicting the demand for bus rides in the whole network.

I also work a teacher at Data Science Retreat in Berlin, where I teach topics like Linux and containers.

I am interested in NLP and cartography, languages, and biking.

visit the speaker at: Github • Homepage