Using a database in a data science project - Lessons learned in production
Jacopo Farina
Since four years we work on a machine learning project in production, using Postgres as a database. We are sharing the problems we encountered and suggest possible solutions: how to keep track of the database usage from different components from a large codebase and detect bottlenecks, how to systematically profile queries duration and reduce downtime when the database is upgraded. We'll see a few simple ways to handle schema changes when ingesting data from outside and caching using files.
Jacopo Farina
Affiliation: Flixbus
I'm a developer from Milan, Italy, living in Germany and working as a Data Engineer in Flixbus since 2018. My team applies machine learning to the problem of predicting the demand for bus rides in the whole network.
I also work a teacher at Data Science Retreat in Berlin, where I teach topics like Linux and containers.
I am interested in NLP and cartography, languages, and biking.