Poster presented at VizWiz Challenge (CVPR 2022)

7 Oktober 2022

The "VizWiz Grand Challenge" is a workshop at CVPR and aims to support visually impaired people in their everyday life. There are a total of 3 tasks that can be completed in this challenge: Visual Question Answering (incl. Answerability prediction) and Visual Grounding and few-shot object recognition challenge. The VIS team have been working on VQA.

Fabian Deuser, Konrad Habel, Philipp J. Rösch and Norbert Oswald have used a simple as well as elegant approach. Based on CLIP features (Radford et al., 2021) they built a classifier using MLP. The classification is supported by an Answer Type Gate. The elegant thing about the model is that it can be trained easily, since only a few parameters have to be updated. In addition, the single models (without ensemble) already achieve very good results, which is an advantage over a large number of competing models.

Less Is More: Linear Layers on CLIP Features as Powerful VizWiz Model
Fabian Deuser, Konrad Habel, Philipp J. Rösch, Norbert Oswald

[arXiv], [VizWiz]

Poster presented at VizWiz Challenge (CVPR 2022)

Aktuelles

Zweiter Platz beim CVPR 2024-Wettbewerb Affective Behavior Analysis in-the-wild

Von der Sportanalytik zur Drohnen-Navigation: wie Wettbewerbe zu Spitzenleistungen in der Forschung führen

Publication presented at WACV2024

Paper presented at 17th NATO OR&A Conference in Laurel, USA

Paper presented at ICCV 2023 in Paris

First place in the ACM Multimedia 2023 UAVs in Multimedia Challenge

Hosting "1st Workshop on Vision-Based Structural Inspections in Civil Engineering" at WACV2024.