Robust Vision Challenge


  • Choose one of the seven computer vision taks, each have their own leaderboard and winners
  • Train a single solution to work for all participating benchmarks for this CV task
  • Choose a short name (4 to 16 characters) for your method and append the postfix _RVC
  • Register your method to the RVC challenge using the form below
  • Submit to each individual benchmark using their regular submission systems. Use your *_RVC method name also as your "team name" where applicable.
  • The RVC crawler will automatically join the individual rankings and show them at the leaderboard

Register Method to RVC

Please fill out the form below to register to the RVC challenge. We will send you a link where you can edit your information.

Anonymous submission
I have read and I accept the submission guidelines.

Fields marked with an asterisk (*) are mandatory.


When will we know the final result/winners?

The final winners are announced at the workshop live sessions during ECCV 2022. The leaderboard might still change after the submission deadline due to long evaluation times, stuck evaluations, or legitimate claims for a reevaluation (e.g. if the benchmark was down/had a bug). Finally, we might have to remove entries for violation of the contest rules.

What interinsics/scale is correct for the mono depth prediction task?

The metrics in this task are scale-invariant! In general, we recommend you use KITTI's intrinsics.
Please post open questions to

Rules and Submission Guidelines

The aim of RVC is to push real-world usability and reduce dataset bias of solutions for the defined computer vision tasks: stereo, optical flow, monocular depth estimation, object detection, semantic segmentation, instance segmentation, and panoptic segmentation.

Participants shall create solutions which are agnostic to the input dataset. Submissions which deliberately include a dataset recognition part and dataset-specific sub-solutions are prohibited.

In practice, some tasks may require meta-information for successful model training (e.g. negative labels per image for object detection). We allow the use of such meta-information during training as long as the resulting solution is dataset-agnostic. The detection of dataset sources during prediction of the test data is not allowed. A valid RVC submission must create a unified result from the union of the benchmarking frames without identifying the individual dataset sources. Our dev kit helps to apply basic preprocessing to normalize/unify the datasets.

The unified result may be post-processed for individual benchmark submissions. Our dev kit also provides support for this. The unified results shall be kept archived until the challenge is concluded and should be valid on their own (i.e. a proper prediction for the task at hand in a compatible data format; logits per class per pixel are also allowed). The potential winners of prize money (first/second place per task) are required to allow organizers inspection of the training and prediction source code, process, and unified results to verify compliance with the RVC rules. This inspection is done in confidence and no details about your solution are publicized or shared with the other workshop organizers or participants. We encourage all participants to eventually open source their solutions, but this is not mandatory.

Organizers of the challenge cannot receive prize money. Should such entries be among the top two spots, the next participants in line win the respective prize money.

For tasks that require predicting semantic labels, such as object detection, semantic segmentation, instance segmentation, and panoptic segmentation:

The model should not have manually designed dataset-specific components, such as dataset-specific heads. The model must predict in a dataset-agnostic unified label space. We provide such a space for each task in the dev kits. Participants can use their own unified label space, but its cardinality (number of classes/logits) must not be higher than these limits (defined per task):

  • Object Detection: 550
  • Semantic Segmentation: 300
Motivaiton for individual labels in a unified label space should be semantic-driven rather than dataset-driven.

These limits are based on the RVC unified label space per task with some added slack and shall prevent cheating. The participant must publicly disclose their label space with their submission. The participant must upload a specification of their label space along with their submission to RVC. The participant is allowed to use simple post-processing scripts that project from their unified label space to a dataset-specific space for their submission to each specific leaderboard. But all such scripts must be made public, open-source, and submitted for inspection along with their submission to RVC. The scripts must operate on each class individually and cannot access input images or spatial locations of predictions (e.g., boxes, masks, or image coordinates). Operations such as non-maximum suppression are not allowed during dataset-specific post-processing (NMS is allowed during the dataset-agnostic generation of the unified prediction). The post-processing must operate on each class separately and can only map from a given class (or logits) in the unified label space to a class in a dataset-specific label space.

Please see the RVC dev kit for more details on how to participate at each task and help for training unification:

In general, approaches should be dataset agnostic and work well on unconstrained new data. Good: "Solve the task"; Bad: "Solve the dataset". In detail, some leeway has to be given to allow smooth training and the creation of valid submissions. Here is a summary of allowed and prohibited approaches:

Invalid Approaches:

  • Benchmarking using individual versions/parameters/models per dataset to directly generate individual predictions
  • Training individual separate solutions per dataset
  • Designing a solution with the explicit number of datasets in mind (e.g. having the same number of encoder/decoders in parallel as the number of datasets) to guide the network towards the creation of internally separate solutions per dataset. Another invalid example is the design of parallel input layers based on the number of datasets.
  • Using dataset-specific pre-processing during benchmarking
  • Using the input frame or side channel information to post-process the unified label data into dataset-specific submissions during benchmarking. The postprocessing should only work with the intermediar "unified" result, a result which is valid and usable on its own, and transform it into a valid submission.


  • Choosing individual sampling strategy per dataset during training
  • Using your own unified label space
  • Using dataset-specific pre- / post- processing steps to convert to a unified label space during training
  • Using dataset-specific post processing steps during benchmarking
  • Create/Keep logits as the unified result (while observing cardinality limits) and combine the logits before argmax during the dataset-specific post-processing

You can post open questions to

eXTReMe Tracker