PPML: Machine Learning on Data you cannot see Valerio Maggio PyConDE & PyDataBerlin 2022 conference

PPML: Machine Learning on Data you cannot see

Valerio Maggio

Wednesday 11:45 in A03-A04 wednesday wednesday-11-45

Type/Track Tutorial pydata-data-handling

Privacy is to date one of the major impediment for Machine Learning (ML), when applied to sensitive dataset. One popular example is the case of ML applied to the medical domain, but this generally extends to any data scenario in which sensitive data have or simply cannot be used. Moreover, data anonymisation methods are also not enough to guarantee that privacy will be completely preserved. In fact, it is possible to exploit the memoisation effect of DL models to exploit sensitive information about samples, and the original dataset used for training. However, privacy-preserving machine learning (PPML) methods promise to overcome all this issues, allowing to train Machine learning models on "data that cannot be seen".

The workshop will be organised in two parts: (1) in the first part, we will work on attacks to Deep Learning models, leveraging on their vulnerabilities to exploit insights on original (sensitive) data. We will then explore potential counter-measures to work around these issues. Examples will include cases of image data, as well as textual data where attacks and counter-measures highlight different nuances and corner cases.
(2) In the second part of the workshop, will delve into PPML methods, focusing on mechanisms to train DL networks on encrypted data, as well as on specialised distributed federated training strategies for multiple sensitive datasets.

Tentative Outline

Part 1: Strengthening Deep Neural Networks (40 mins)
- Vulnerabilities and Adversarial Attacks
Break (5 mins)
Part 2: Primer on Privacy-Preserving Machine Learning (40 mins)
- DL training on (Homomorphically) Encrypted Data
- Federated Learning and Intro to Remote Data Science
Closing Remarks (5 mins)

Prerequisites and Requirements

This workshop will assume familiarity with PyTorch deep learning framework, and basics of Machine/Deep Learning. No prior specialised knowledge of anonymisations, nor security will be required. Lecture notes will be delivered via interactive Jupyter Notebooks, so the audience should be familiar with the Jupyter environment. However, instructions on how to set up the environment will be shared with delegates beforehand, and prior to the workshop.

Tags Neural Networks / Deep Learning Security

Level Domain Expertise some Python Skill Level expert

Valerio Maggio

Affiliation: University of Bristol

Valerio Maggio is a Researcher and Data scientist currently holding an appointment of Senior Research Associate at the University of Bristol. PhD in Computer Science at Univ. Naples "Federico II" with a thesis on Machine Learning for Software Maintainability, Valerio is well versed into open research software, and best software development practice. His research interests span from reproducibility of Machine/Deep Learning pipelines with application in health, to Privacy-enhancing Data Science methods. Valerio is also an active member of the Python community, and Open source contributor. He is one of the lead organiser of PyCon/PyData Italia, EuroSciPy (2015-2019). In 2019 Valerio has been awarded the honorary position of Microsoft Azure Cloud Research Software Engineer due to its work for Scalable Machine Learning pipelines on Microsoft Azure.

visit the speaker at: Github