Trojan Source Malware - Can we trust open-source anymore?

Cheuk Ting Ho

Monday 15:40 in B07-B08 monday monday-15-40

Type/Track Talk pycon-python-language

Background:

After researchers at the University of Cambridge published a paper about a malicious attack named Trojan Source, which exploited the fact that some program interpreters, like CPython, can handle Unicode. This has caused concerns in the open-source community about the malicious contribution that looks totally legitimate in human eyes but contain invisible attacks. As a member of the Python community, we should all be aware of that and understand how we can prevent this attack to happen.

About this talk:

In this talk, Cheuk will decode the finding in this paper to a level that can be understood by everyone. She will start with a joke example who you can mess up someone by using Unicode. She will then explain what is Unicode and why it causes trouble. Afterwards, she will explain the Python examples in the paper and why it can be dangerous. Lastly, she will open up a discussion on how we should defend ourselves from those attacks and what we can do as a community.

Outline (30 mins talk):

5 minutes - Introduction, the opening of the talk

In this session, Cheuk will ask audiences to debug a code snippet that looks absolutely fine but will not work as code. She will explain that this is the same concept used in Trojan Source.

10 mins - What is Unicode

In this session, Cheuk will give an introduction about what is Unicode, what it is to a computer and why we need Unicode in computers. She will also explain how the benefit of having Unicode can also be a downfall to make us vulnerable to the Trojan Source attack.

10 mins - How Trojan Source works in Python

In this session, Cheuk will show a few examples using the Trojan Source in legitimate Python code. She will point out how the attack is hiding in the source code and in what cases it can be dangerous.

5 mins - How to protect ourselves

In this session, Cheuk will open the discussion and make a few suggestions of how we can protect ourselves as a community. This will lead to the Q&A session where the audience can weigh in on their own thought.

Target audiences

From those who are curious to maintainers of open-source libraries. This is the knowledge we should all know and be aware of. Cheuk will explain in a way that expects no prior knowledge is needed.

What will audiences learn

About Trojan Source attacks and how it works. They may also learn about how interpreters, especially Python interpreters, works with Unicode. Plus, they may have increased awareness about security in the open-source world.

Tags Community Governance Python fundamentals Security Transparency / Interpretability

Level Domain Expertise some Python Skill Level some

Cheuk Ting Ho

Affiliation: TerminusDB

After having a career in data science, Cheuk now brings her knowledge in data and passion for the tech community into TerminusDB as the developer relations lead. Cheuk constantly contributes to the open-source community by giving free tutorials on Twitch and organize sprints to encourage diversity contributions.

visit the speaker at: Github • Homepage