The Robust Vision Challenge 2020 was a virtual full day event held in conjunction with ECCV 2020 in Glasgow. Videos of the full workshop are available at YouTube:
Introduction to Robust Vision Challenge 2020
First Live Session: 12h-14h UTC+1
Youtube link for session 1
- 12h00-12h15: Introduction / Announcement of RVC Winners
- 12h15-12h30: CFNet_RVC (Stereo)
- 12h30-12h45: NLCA_NET_v2_RVC (Stereo)
- 12h45-13h00: PRAFlow_RVC (Flow)
- 13h00-13h15: RMDP_RVC (Depth)
- 13h15-13h30: wisedet_RVC (Object Det.)
- 13h30-13h45: EffPS_b1bs4_RVC (Panoptic)
- 13h45-14h00: Closing
Second Live Session: 22h-24h UTC+1
Youtube link for session 2
- 22h00-22h15: Introduction / Announcement of RVC Winners
- 22h15-22h30: RAFT-TF_RVC (Flow)
- 22h30-23h45: UniDet_RVC (Object Det.)
- 22h45-23h00: UniDet_RVC (Instance)
- 23h00-23h15: SN_RN152pyrx8_RVC (Semantic)
- 23h15-23h30: MSeg1080_RVC (Semantic)
- 23h30-24h00: Closing
Our 2020 Keynote Speakers:
Keynote: Robustness Across the Data Abundance Spectrum
Ross Girshick is a research scientist at Facebook AI Research (FAIR),
working on computer vision and machine learning. He received a PhD in
computer science from the University of Chicago under the supervision
of Pedro Felzenszwalb in 2012. Prior to joining FAIR, Ross was a
researcher at Microsoft Research, Redmond and a postdoc at the
University of California, Berkeley, where he was advised by Jitendra
Malik and Trevor Darrell. His interests include instance-level object
understanding and visual reasoning challenges that combine natural
language processing with computer vision. He received the 2017 PAMI
Young Researcher Award and is well-known for developing the R-CNN
approach to object detection. In 2017, Ross also received the Marr
Prize at ICCV for Mask R-CNN.
Keynote: Noisy Student Training for Robust Vision
Quoc Le is a Principal Scientist at Google Brain, where he
works on large scale brain simulation using unsupervised feature
learning and deep learning. His work focuses on object recognition,
speech recognition and language understanding. Quoc obtained his PhD
at Stanford, undergraduate degree with First Class Honours and
Distinguished Scholar at the Australian National University, and was a
researcher at National ICT Australia, Microsoft Research and Max
Planck Institute of Biological Cybernetics. Quoc won best paper award
as ECML 2007.
Keynote: What Do Our Models Learn?
Aleksander Mądry is Professor of Computer Science in the MIT EECS
Department. He is a principal investigator in the MIT Computer Science
and Artificial Intelligence Laboratory (CSAIL), the Director of the
MIT Center for Deployable Machine Learning, and the Faculty Lead of
the CSAIL-MSR Trustworthy and Robust AI Collaboration. Aleksander
received his PhD from MIT in 2011. Prior to joining MIT, he spent time
at Microsoft Research New England and on the faculty of
EPFL. Aleksander's research interests span algorithms, continuous
optimization, science of deep learning and understanding machine
learning from a robustness perspective. His work has been recognized
with a number of awards, including an NSF CAREER Award, an Alfred
P. Sloan Research Fellowship, an ACM Doctoral Dissertation Award
Honorable Mention, and 2018 Presburger Award.
Epilogue of Robust Vision Challenge 2020
When will we know the final result/winners?
The final winners are announced at the workshop live sessions on 28th August (12-14; 22-24h UTC+1).
The leaderboard might still change after the submission deadline due to long evaluation times, stuck evaluations,
or legitimate claims for a reevaluation (e.g. if the benchmark was down/had a bug).
Finally, we might have to remove entries for violation of the contest rules.
What format/content should be in the report?
Please send a pdf version of your report (1 or 2 pages, no special format/layout required) to email@example.com. Note that the report will be publicly available on this website. It should include:
- Data/Datasets used for training (supervised, semi-, and unsupervised)
- Short summary of data augmentations used during training
- Benchmark-specific steps you took for individual datasets during training and submission. If there are label mappings, a full list of intermediary labels and mappings to/from this space should be included or a public source with this information must be linked
- A short paragraph on your methodology/differences vs. state of the art
- A short paragraph about the biggest challenge met specifically when dealing with multiple leaderboards and data policies.
- If there is a paper connected to the submission: a BibTeX entry
- Include URLs if code is publicly available
What interinsics/scale is correct for the mono depth prediction task?
The metrics in this task are scale-invariant! In general, we recommend you use KITTI's intrinsics.
What happened to the Obj365 dataset/benchmark?
Megvii had to pull support for RVC due to internal policy changes. Please see objects365.org for more details. The object detection challenge will be held using the remaining benchmarks: COCO, MVD, and OID.
Due to this change, we have extended the submission deadline for all seven tasks to August 14th.
The aim of RVC is to push real-world usability and reduce dataset bias of solutions for the defined computer vision tasks:
stereo, optical flow, monocular depth estimation, object detection, semantic segmentation, instance segmentation, and panoptic segmentation.
Participants shall create solutions which are agnostic to the input dataset.
Submissions which deliberately include a dataset recognition part and dataset-specific sub-solutions are prohibited.
In practice, some tasks may require meta-information for successful model training (e.g. negative labels per image for object detection).
We allow the use of such meta-information during training as long as the resulting solution is dataset-agnostic.
The detection of dataset sources during prediction of the test data is not allowed.
A valid RVC submission must create a unified result from the union of the benchmarking frames without identifying the individual dataset sources.
Our dev kit helps to apply basic preprocessing to normalize/unify the datasets.
The unified result may be post-processed for individual benchmark submissions.
Our dev kit also provides support for this. The unified results shall be kept archived until the challenge is concluded and should be valid on their own (i.e. a proper prediction for the task at hand in a compatible data format; logits per class per pixel are also allowed). The potential winners of prize money (first/second place per task) are required to allow organizers inspection of the training and prediction source code, process, and unified results to verify compliance with the RVC rules. This inspection is done in confidence and no details about your solution are publicized or shared with the other workshop organizers or participants.
We encourage all participants to eventually open source their solutions, but this is not mandatory.
Organizers of the challenge cannot receive prize money. Should such entries be among the top two spots, the next participants in line win the respective prize money.
For tasks that require predicting semantic labels, such as object detection, semantic segmentation, instance segmentation, and panoptic segmentation:
The model should not have manually designed dataset-specific components, such as dataset-specific heads.
The model must predict in a dataset-agnostic unified label space.
We provide such a space for each task in the dev kits.
Participants can use their own unified label space, but its cardinality (number of classes/logits)
must not be higher than these limits (defined per task):
Motivaiton for individual labels in a unified label space should be semantic-driven rather than dataset-driven.
- Object Detection: 700
- Semantic Segmentation: 300
- Instance Segmentation: 400
- Panoptic Segmentation: 200
These limits are based on the RVC unified label space per task with some added slack and shall prevent cheating.
The participant must publicly disclose their label space with their submission.
The participant must upload a specification of their label space along with their submission to RVC.
The participant is allowed to use simple post-processing scripts that project from their unified label space to
a dataset-specific space for their submission to each specific leaderboard.
But all such scripts must be made public, open-source, and submitted for inspection along with their submission to RVC.
The scripts must operate on each class individually and cannot access input images or spatial locations of predictions
(e.g., boxes, masks, or image coordinates).
Operations such as non-maximum suppression are not allowed during dataset-specific post-processing
(NMS is allowed during the dataset-agnostic generation of the unified prediction).
The post-processing must operate on each class separately and can only map from a given class (or logits)
in the unified label space to a class in a dataset-specific label space.
Please see the RVC dev kit for more details on how to participate at each task and help for training unification: https://github.com/ozendelait/rvc_devkit/tree/release
In general, approaches should be dataset agnostic and work well on unconstrained new data.
Good: "Solve the task"; Bad: "Solve the dataset".
In detail, some leeway has to be given to allow smooth training and the creation of valid submissions.
Here is a summary of allowed and prohibited approaches:
- Benchmarking using individual versions/parameters/models per dataset to directly generate individual predictions
- Training individual separate solutions per dataset
- Designing a solution with the explicit number of datasets in mind
(e.g. having the same number of encoder/decoders in parallel as the number of datasets)
to guide the network towards the creation of internally separate solutions per dataset.
Another invalid example is the design of parallel input layers based on the number of datasets.
- Using dataset-specific pre-processing during benchmarking
- Using the input frame or side channel information to post-process the unified label data into
dataset-specific submissions during benchmarking. The postprocessing should only work with the intermediar "unified" result,
a result which is valid and usable on its own, and transform it into a valid submission.
- Choosing individual sampling strategy per dataset during training
- Using your own unified label space
- Using dataset-specific pre- / post- processing steps to convert to a unified label space during training
- Using dataset-specific post processing steps during benchmarking
- Create/Keep logits as the unified result (while observing cardinality limits) and
combine the logits before argmax during the dataset-specific post-processing