A data scientist's guide to code reviews
Alexandra Wörner
The standard code review process known from traditional software engineering does also apply to data tasks when these also follow traditional software engineering practices in large parts. Examples for such tasks are data extraction or transformation pipelines and machine learning (ML) services which run in production(-like) systems. Yet, a significant amount of data science work is experimental, e.g. analysing data, preparing data for use with ML algorithms or training and evaluating ML models. From experience, code reviews are often skipped during these experimental tasks although they are still highly important in order to detect issues or errors early. Part of the reasons for skipping checks is that the focus of this work is not to produce production-grade code and rather to try out and verify a certain concept which can be put into production lateron, if successful. Code reviews need to be adjusted to the changed requirements of data science work for them to still be effective. A lesser focus on code quality and more on the technical correctness and logic of the concept instead are key in these adjusted reviews. Thus, code reviews develop to some form of peer review, as they are known from the process of paper review.
The talk will give an introduction to tradtional code reviews as well as the updated form of code reviews for data scientist. After showing why standard code reviews are not always applicable, the changes to the process will be described and explained. Furthermore, listeners will hear practical recommendations for what feedback they should give to make a code review effective. Using two of the most common software version control systems, GitLab and GitHub, supporting functionalities, which make the strenuous process more pleasant and efficient, will be shown. The aim of this talk is to give data scientists a reason as well as some guidance to do code reviews.
Alexandra Wörner
Affiliation: scieneers
Alexandra Wörner is a data scientist at scieneers GmbH, where she supports project teams in industry and the non-profit sector in implementing their visions around data and machine learning. Past and current projects cover a wide range of topics, including recommender systems, time series modeling, customer analytics and engagement, and topic modeling. Given her strong computer science background, Alexandra's natural interest in data, algorithms, and tools is accompanied by a curiosity about how software engineering practices can be applied to data science tasks with the intent to improve and facilitate the process from experimentation to production.