Honey, I shrunk the target variable! Common pitfalls when transforming the target variable and how to exploit transformations.
Florian Wilhelm
The talk addresses the consequences of transforming the target variable on a conceptual but also a mathematical level. Still, the emphasis is on conveying the notion behind the interplay of your chosen error measure and the transformation of your target variable, so that you get some practical gain from it. Thus, everything will also be demonstrated on some use-case using a Jupyter notebook.
Motivation
- Real Data Science projects are not like Kaggle
- Choosing the right error measure
Mathematical Recap
- Distribution of the target variable
- Linear model and other assumptions
Analysis of the Residual Distribution
- What is the residual distribution and why should I care?
- What are we getting if we optimizel1 or the l2 norm and why?
Shrinking the Target variable
- Why does it change our error measure?
- Practical demonstration what can go wrong
- How can it be used to our advantage?
Summary
- What did we learn?
- Conclusion
Florian Wilhelm
Affiliation: inovex GmbH
Data Scientist and Python developer with a strong mathematical background. Always looking to apply mathematics to real-world problems and enthusiastic about everything math.
As Head of Data Science at inovex GmbH, I enjoy working on innovative Data Science & Data Engineering projects with experts every day. In my spare time, I like to contribute to several OSS projects in the PyData stack and started the PyScaffold project to foster and establish best practices and clean coding within the Python ecosystem.