Honey, I shrunk the target variable! Common pitfalls when transforming the target variable and how to exploit transformations.

Florian Wilhelm

Tuesday 14:50 in B05-B06 tuesday tuesday-14-50

Type/Track Talk pydata-machine-learning-stats

The talk addresses the consequences of transforming the target variable on a conceptual but also a mathematical level. Still, the emphasis is on conveying the notion behind the interplay of your chosen error measure and the transformation of your target variable, so that you get some practical gain from it. Thus, everything will also be demonstrated on some use-case using a Jupyter notebook.

Motivation

Real Data Science projects are not like Kaggle
Choosing the right error measure

Mathematical Recap

Distribution of the target variable
Linear model and other assumptions

Analysis of the Residual Distribution

What is the residual distribution and why should I care?
What are we getting if we optimizel1 or the l2 norm and why?

Shrinking the Target variable

Why does it change our error measure?
Practical demonstration what can go wrong
How can it be used to our advantage?

Summary

What did we learn?
Conclusion

Tags Math Predictive Modelling Statistics

Level Domain Expertise some Python Skill Level some

Florian Wilhelm

Affiliation: inovex GmbH

Data Scientist and Python developer with a strong mathematical background. Always looking to apply mathematics to real-world problems and enthusiastic about everything math.

As Head of Data Science at inovex GmbH, I enjoy working on innovative Data Science & Data Engineering projects with experts every day. In my spare time, I like to contribute to several OSS projects in the PyData stack and started the PyScaffold project to foster and establish best practices and clean coding within the Python ecosystem.

visit the speaker at: Github • Homepage