Robust Vision Challenge 2022

The Robust Vision Challenge 2020 was a virtual full day event held in conjunction with ECCV 2020 in Glasgow. Videos of the full workshop are available at YouTube:

Introduction to Robust Vision Challenge 2020

First Live Session: 12h-14h UTC+1

Youtube link for session 1

12h00-12h15: Introduction / Announcement of RVC Winners
12h15-12h30: CFNet_RVC (Stereo)
12h30-12h45: NLCA_NET_v2_RVC (Stereo)
12h45-13h00: PRAFlow_RVC (Flow)
13h00-13h15: RMDP_RVC (Depth)
13h15-13h30: wisedet_RVC (Object Det.)
13h30-13h45: EffPS_b1bs4_RVC (Panoptic)
13h45-14h00: Closing

Second Live Session: 22h-24h UTC+1

Youtube link for session 2

22h00-22h15: Introduction / Announcement of RVC Winners
22h15-22h30: RAFT-TF_RVC (Flow)
22h30-23h45: UniDet_RVC (Object Det.)
22h45-23h00: UniDet_RVC (Instance)
23h00-23h15: SN_RN152pyrx8_RVC (Semantic)
23h15-23h30: MSeg1080_RVC (Semantic)
23h30-24h00: Closing

Our 2020 Keynote Speakers:

Ross Girshick
Facebook

Keynote: Robustness Across the Data Abundance Spectrum

Ross Girshick is a research scientist at Facebook AI Research (FAIR), working on computer vision and machine learning. He received a PhD in computer science from the University of Chicago under the supervision of Pedro Felzenszwalb in 2012. Prior to joining FAIR, Ross was a researcher at Microsoft Research, Redmond and a postdoc at the University of California, Berkeley, where he was advised by Jitendra Malik and Trevor Darrell. His interests include instance-level object understanding and visual reasoning challenges that combine natural language processing with computer vision. He received the 2017 PAMI Young Researcher Award and is well-known for developing the R-CNN approach to object detection. In 2017, Ross also received the Marr Prize at ICCV for Mask R-CNN.

Quoc Le
Google

Keynote: Noisy Student Training for Robust Vision

Quoc Le is a Principal Scientist at Google Brain, where he works on large scale brain simulation using unsupervised feature learning and deep learning. His work focuses on object recognition, speech recognition and language understanding. Quoc obtained his PhD at Stanford, undergraduate degree with First Class Honours and Distinguished Scholar at the Australian National University, and was a researcher at National ICT Australia, Microsoft Research and Max Planck Institute of Biological Cybernetics. Quoc won best paper award as ECML 2007.

Aleksander Mądry
MIT

Keynote: What Do Our Models Learn?

Aleksander Mądry is Professor of Computer Science in the MIT EECS Department. He is a principal investigator in the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL), the Director of the MIT Center for Deployable Machine Learning, and the Faculty Lead of the CSAIL-MSR Trustworthy and Robust AI Collaboration. Aleksander received his PhD from MIT in 2011. Prior to joining MIT, he spent time at Microsoft Research New England and on the faculty of EPFL. Aleksander's research interests span algorithms, continuous optimization, science of deep learning and understanding machine learning from a robustness perspective. His work has been recognized with a number of awards, including an NSF CAREER Award, an Alfred P. Sloan Research Fellowship, an ACM Doctoral Dissertation Award Honorable Mention, and 2018 Presburger Award.

Epilogue of Robust Vision Challenge 2020

Challenges

RVC 2020 featured seven challenges: stereo, optical flow, single image depth prediction, object detection, semantic segmentation, instance segmentation, and panoptic segmentation. Participants are free to submit to a single challenge or to multiple challenges. For each challenge, the results of a single model must be submitted to all benchmarks (indicated with an x below).

✘

✘

✘

✘

✘

✘

✘

✘

✘

✘

✘

✘

✘

Winners and prices for the seven challenges in 2020 were:

1st Place: $1200

2nd Place: $600

Presentation at our
ECCV 2020 Workshop

RVC 2020 Stereo Leaderboard

CFNet_RVC

Submitted by Anonymous

NLCA_NET_v2_RVC

NLCA-Net: a non-local context attention network for stereo matching - Submitted by Zhibo Rao (Northwestern Polytechnical University)

HSM-Net_RVC

Hierarchical deep stereo matching on high-resolution images, CVPR 2019. [Project page] - Submitted by Gengshan Yang (CMU)

CVANet_RVC

Submitted by Haoyu Ren (Samsung Semiconductor Inc)

AANet_RVC

AANet: Adaptive Aggregation Network for Efficient Stereo Matching [Project page] - Submitted by Haofei Xu (University of Science and Technology of China)

GANetREF_RVC

Baseline - Submitted by Nicolas Jourdan (RVC Team)

SGM_RVC

Baseline - Submitted by Heiko Hirschmueller (Roboception GmbH)

ELAS_RVC

Baseline - Submitted by Thomas Schöps (RVC Team)

STTRV1_RVC

✓

[incomplete submission] Submitted by Anonymous

RVC 2020 Flow Leaderboard

RAFT-TF_RVC

Submitted by Deqing Sun (Google)

PRAFlow_RVC

Submitted by Zhexiong Wan (Northwestern Polytechnical University, Xi''an, China)

C-RAFT_RVC

Submitted by Henrique Morimitsu (Tsinghua University)

VCN_RVC

Volumetric Correspondence Networks for Optical Flow, NeurIPS 2019. [Project page] - Submitted by Gengshan Yang (CMU)

IRR-PWC_RVC

Iterative Residual Refinement for Joint Optical Flow and Occlusion Estimation [Project page] - Submitted by Junhwa Hur (TU Darmstadt)

LSM_FLOW_RVC

LSM: Learning Subspace Minimization for Low-Level Vision [Project page] - Submitted by Chengzhou Tang (Simon Fraser University)

PWC-Net_RVC

PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume, CVPR 2018. [Project page] - Submitted by Deqing Sun (Google)

TVL1_RVC

Baseline - Submitted by Toby Weed (Middlebury College)

H+S_RVC

Baseline - Submitted by Toby Weed (Middlebury College)

RVC 2020 Depth Leaderboard

BTSREF_RVC

Baseline - Submitted by Hendrik Schilling (rabbitai)

RMDP_RVC

Submitted by Ke Xian (Huazhong University of Science and Technology)

packnSFMHR_RVC

Baseline - Submitted by Oliver Zendel (AIT)

RVC 2020 Object Detection Leaderboard

UniDet_RVC

Submitted by Anonymous

wisedet_RVC

Submitted by JIAQI FAN (Wisers AI Lab)

FRCNN_R50_GN_RVC

Baseline - Submitted by Claudio Michaelis (RVC Team)

RVC 2020 Semantic Segmentation Leaderboard

SN_RN152pyrx8_RVC

Multi-domain semantic segmentation with pyramidal fusion [Project page] - Submitted by Marin Oršić ( UniZg-FER / VSITE)

MSeg1080_RVC

Baseline - Submitted by John Lambert (Georgia Tech)

seamseg_rvcsubset

✓

[incomplete submission] Baseline - Submitted by Oliver Zendel (RVC Team)

EffPS_b1bs4_RVC

✓

[incomplete submission] Submitted by Rohit Mohan (University of Freiburg)

RVC 2020 Instance Segmentation Leaderboard

UniDet_RVC

Submitted by Anonymous

seamseg_rvcsubset

✓

[incomplete submission] Baseline - Submitted by Oliver Zendel (RVC Team)

EffPS_b1bs4_RVC

✓

[incomplete submission] Submitted by Rohit Mohan (University of Freiburg)

RVC 2020 Panoptic Segmentation Leaderboard

EffPS_b1bs4_RVC

Submitted by Rohit Mohan (University of Freiburg)

seamseg_rvcsubset

Baseline - Submitted by Oliver Zendel (RVC Team)

The 2020 Organizing Team was:

Oliver Zendel
AIT Vienna

Hassan Abu Alhaija
University Heidelberg

Rodrigo Benenson
Google Research

Marius Cordts
Daimler

Angela Dai
Stanford University

Xavier Puig Fernandez
MIT CSAIL

Andreas Geiger
MPI Tübingen/ETH Zürich

Niklas Hanselmann
MPI Tübingen/Daimler

Nicolas Jourdan
TU Darmstadt

Vladlen Koltun
Intel

Peter Kontschieder
Mapillary

Alina Kuznetsova
Google Research

Yubin Kuang
Mapillary

Tsung-Yi Lin
Google Brain

Claudio Michaelis
MPI Tübingen

Gerhard Neuhold
Mapillary

Matthias Nießner
TU Munich

Marc Pollefeys
ETH Zürich/Microsoft

Rene Ranftl
Intel

Stephan Richter
Intel

Carsten Rother
University Heidelberg

Torsten Sattler
Chalmers

Daniel Scharstein
Middlebury College

Hendrik Schilling
rabbitai/University Heidelberg

Nick Schneider
KIT/Daimler

Jonas Uhrig
Univ. Freiburg/Daimler

Jonas Wulff
MPI Tübingen

Bolei Zhou
CUHK

FAQ

When will we know the final result/winners?

The final winners are announced at the workshop live sessions on 28th August (12-14; 22-24h UTC+1). The leaderboard might still change after the submission deadline due to long evaluation times, stuck evaluations, or legitimate claims for a reevaluation (e.g. if the benchmark was down/had a bug). Finally, we might have to remove entries for violation of the contest rules.

What format/content should be in the report?

Please send a pdf version of your report (1 or 2 pages, no special format/layout required) to rvc2020eccvw@gmail.com. Note that the report will be publicly available on this website. It should include:

Data/Datasets used for training (supervised, semi-, and unsupervised)
Short summary of data augmentations used during training
Benchmark-specific steps you took for individual datasets during training and submission. If there are label mappings, a full list of intermediary labels and mappings to/from this space should be included or a public source with this information must be linked
A short paragraph on your methodology/differences vs. state of the art
A short paragraph about the biggest challenge met specifically when dealing with multiple leaderboards and data policies.
If there is a paper connected to the submission: a BibTeX entry
Include URLs if code is publicly available

What interinsics/scale is correct for the mono depth prediction task?

The metrics in this task are scale-invariant! In general, we recommend you use KITTI's intrinsics.

What happened to the Obj365 dataset/benchmark?

Megvii had to pull support for RVC due to internal policy changes. Please see objects365.org for more details. The object detection challenge will be held using the remaining benchmarks: COCO, MVD, and OID. Due to this change, we have extended the submission deadline for all seven tasks to August 14th.

Rules for RVC2020

The aim of RVC is to push real-world usability and reduce dataset bias of solutions for the defined computer vision tasks: stereo, optical flow, monocular depth estimation, object detection, semantic segmentation, instance segmentation, and panoptic segmentation.

Participants shall create solutions which are agnostic to the input dataset. Submissions which deliberately include a dataset recognition part and dataset-specific sub-solutions are prohibited.

In practice, some tasks may require meta-information for successful model training (e.g. negative labels per image for object detection). We allow the use of such meta-information during training as long as the resulting solution is dataset-agnostic. The detection of dataset sources during prediction of the test data is not allowed. A valid RVC submission must create a unified result from the union of the benchmarking frames without identifying the individual dataset sources. Our dev kit helps to apply basic preprocessing to normalize/unify the datasets.

The unified result may be post-processed for individual benchmark submissions. Our dev kit also provides support for this. The unified results shall be kept archived until the challenge is concluded and should be valid on their own (i.e. a proper prediction for the task at hand in a compatible data format; logits per class per pixel are also allowed). The potential winners of prize money (first/second place per task) are required to allow organizers inspection of the training and prediction source code, process, and unified results to verify compliance with the RVC rules. This inspection is done in confidence and no details about your solution are publicized or shared with the other workshop organizers or participants. We encourage all participants to eventually open source their solutions, but this is not mandatory.

Organizers of the challenge cannot receive prize money. Should such entries be among the top two spots, the next participants in line win the respective prize money.

For tasks that require predicting semantic labels, such as object detection, semantic segmentation, instance segmentation, and panoptic segmentation:

The model should not have manually designed dataset-specific components, such as dataset-specific heads. The model must predict in a dataset-agnostic unified label space. We provide such a space for each task in the dev kits. Participants can use their own unified label space, but its cardinality (number of classes/logits) must not be higher than these limits (defined per task):

Object Detection: 700
Semantic Segmentation: 300
Instance Segmentation: 400
Panoptic Segmentation: 200

Motivaiton for individual labels in a unified label space should be semantic-driven rather than dataset-driven.

These limits are based on the RVC unified label space per task with some added slack and shall prevent cheating. The participant must publicly disclose their label space with their submission. The participant must upload a specification of their label space along with their submission to RVC. The participant is allowed to use simple post-processing scripts that project from their unified label space to a dataset-specific space for their submission to each specific leaderboard. But all such scripts must be made public, open-source, and submitted for inspection along with their submission to RVC. The scripts must operate on each class individually and cannot access input images or spatial locations of predictions (e.g., boxes, masks, or image coordinates). Operations such as non-maximum suppression are not allowed during dataset-specific post-processing (NMS is allowed during the dataset-agnostic generation of the unified prediction). The post-processing must operate on each class separately and can only map from a given class (or logits) in the unified label space to a class in a dataset-specific label space.

Please see the RVC dev kit for more details on how to participate at each task and help for training unification: https://github.com/ozendelait/rvc_devkit/tree/release

In general, approaches should be dataset agnostic and work well on unconstrained new data. Good: "Solve the task"; Bad: "Solve the dataset". In detail, some leeway has to be given to allow smooth training and the creation of valid submissions. Here is a summary of allowed and prohibited approaches:

Invalid Approaches:

Benchmarking using individual versions/parameters/models per dataset to directly generate individual predictions
Training individual separate solutions per dataset
Designing a solution with the explicit number of datasets in mind (e.g. having the same number of encoder/decoders in parallel as the number of datasets) to guide the network towards the creation of internally separate solutions per dataset. Another invalid example is the design of parallel input layers based on the number of datasets.
Using dataset-specific pre-processing during benchmarking
Using the input frame or side channel information to post-process the unified label data into dataset-specific submissions during benchmarking. The postprocessing should only work with the intermediar "unified" result, a result which is valid and usable on its own, and transform it into a valid submission.

Valid

Choosing individual sampling strategy per dataset during training
Using your own unified label space
Using dataset-specific pre- / post- processing steps to convert to a unified label space during training
Using dataset-specific post processing steps during benchmarking
Create/Keep logits as the unified result (while observing cardinality limits) and combine the logits before argmax during the dataset-specific post-processing

Submit Results