Computer Vision applications increasingly find their way into more and more areas and onto devices with limited available resources. Building a Computer Vision application for a specific problem poses several challenges where you might know that there exists a solution in principle, but you still need to map out the concrete steps in between, for each of which several different options can exist. Hence, you need to figure out what you need to know, make a lot of decisions on the way, and not get lost in the quicksand of reading documentation and trying out several options. After all, you would like to build your application in finite time, right?
This talk showcases several challenges that I came across while creating a real-time application that records videos of a person interacting with documents printed on paper – more specifically, highlighting text in a document - and then detecting and extracting the highlighted text into a human-readable format for further use in downstream applications, while giving the person as much flexibility as possible regarding their work environment.
Along the way, I will give you several strategies at hand for exploring the problem-solution space, so that you can answer questions for your own (Computer Vision) projects, like: How to choose a solution to a given problem? How to figure out potential options for a solution, and how to decide between them?
And to conclude with a spoiler: The first strategy addresses how you can leverage fast feedback loops.
Affiliation: HITS (Heidelberg Institute for Theoretical Studies)
I'm a physicist, and currently working in Computer Vision with slight overlaps to Natural Language Processing. While having a broad range of interests in science and technology, I am particularly interested in Software Engineering and Machine Learning, and passionate about producing high quality results.