How to build a Python-based Research Cloud Platform from scratch
Andre Fröhlich
Our company is a quantitative asset manger with a focus on SRI (Social Responsible Investing) and sustainability. We have been conducting Data Science for over 20 years. We used the onPrem proprietary tools for analysis of that day, but now Python is state-of-arts and thus we decided to switch and in addition move to the cloud, to ensure we can properly address the challenges of the future.
In general we are a data-driven company that ingests large amounts of data from various sources, e.g. financial market data, news data and SRI data (e.g. on carbon footprints, water usage and human rights) and process it to predict future developments.
In this talk we will sketch out how we managed the migration, building a research platform containing the following elements:
- A compute cluster for big data calculations and modern machine learning algorithms (→ Dask)
- A development/notebook environment (→ JupyterHub)
- The right tool to execute data pipelines (→ Airflow)
- A repository to store our internally developed packages (→ Nexus)
Of special concern where topics like selecting the suitable Infrastructure (→ Azure Kubernetes Service, Azure Data Lakes) and implementing regulatory requirements.
The talk will give a high-level overview on the various steps that were taken during the project and the architecture of the platform, but will omit fine-granular implementation details. It should be of interest for architects and decision makers as well as developers.
Andre Fröhlich
Affiliation: Quoniam Asset Management GmbH
Andre has been working as a consultant and in the financial industry for 18 years as a developer, business analyst, project and application manager. He started off as a Java guy, but 4 years ago he fell in love with Python and Data Science/Engineering and never looked back.
His current role is Head of Research Technology for Quoniam Asset Management GmbH a subsidiary of Union Investment where he is responsible for building and improving a modern cloud-based platform for Data Science.
visit the speaker at: Homepage