When will we know the final result/winners?The final winners are announced at the workshop live sessions on 28th August (12-14; 22-24h UTC+1). The leaderboard might still change after the submission deadline due to long evaluation times, stuck evaluations, or legitimate claims for a reevaluation (e.g. if the benchmark was down/had a bug). Finally, we might have to remove entries for violation of the contest rules.
What format/content should be in the report?Please send a pdf version of your report (1 or 2 pages, no special format/layout required) to email@example.com. Note that the report will be publicly available on this website. It should include:
- Data/Datasets used for training (supervised, semi-, and unsupervised)
- Short summary of data augmentations used during training
- Benchmark-specific steps you took for individual datasets during training and submission. If there are label mappings, a full list of intermediary labels and mappings to/from this space should be included or a public source with this information must be linked
- A short paragraph on your methodology/differences vs. state of the art
- A short paragraph about the biggest challenge met specifically when dealing with multiple leaderboards and data policies.
- If there is a paper connected to the submission: a BibTeX entry
- Include URLs if code is publicly available
What interinsics/scale is correct for the mono depth prediction task?The metrics in this task are scale-invariant! In general, we recommend you use KITTI's intrinsics.
What happened to the Obj365 dataset/benchmark?Megvii had to pull support for RVC due to internal policy changes. Please see objects365.org for more details. The object detection challenge will be held using the remaining benchmarks: COCO, MVD, and OID. Due to this change, we have extended the submission deadline for all seven tasks to August 14th.
The aim of RVC is to push real-world usability and reduce dataset bias of solutions for the defined computer vision tasks: stereo, optical flow, monocular depth estimation, object detection, semantic segmentation, instance segmentation, and panoptic segmentation.
Participants shall create solutions which are agnostic to the input dataset. Submissions which deliberately include a dataset recognition part and dataset-specific sub-solutions are prohibited.
In practice, some tasks may require meta-information for successful model training (e.g. negative labels per image for object detection). We allow the use of such meta-information during training as long as the resulting solution is dataset-agnostic. The detection of dataset sources during prediction of the test data is not allowed. A valid RVC submission must create a unified result from the union of the benchmarking frames without identifying the individual dataset sources. Our dev kit helps to apply basic preprocessing to normalize/unify the datasets.
The unified result may be post-processed for individual benchmark submissions. Our dev kit also provides support for this. The unified results shall be kept archived until the challenge is concluded and should be valid on their own (i.e. a proper prediction for the task at hand in a compatible data format; logits per class per pixel are also allowed). The potential winners of prize money (first/second place per task) are required to allow organizers inspection of the training and prediction source code, process, and unified results to verify compliance with the RVC rules. This inspection is done in confidence and no details about your solution are publicized or shared with the other workshop organizers or participants. We encourage all participants to eventually open source their solutions, but this is not mandatory.
Organizers of the challenge cannot receive prize money. Should such entries be among the top two spots, the next participants in line win the respective prize money.
For tasks that require predicting semantic labels, such as object detection, semantic segmentation, instance segmentation, and panoptic segmentation:
The model should not have manually designed dataset-specific components, such as dataset-specific heads. The model must predict in a dataset-agnostic unified label space. We provide such a space for each task in the dev kits. Participants can use their own unified label space, but its cardinality (number of classes/logits) must not be higher than these limits (defined per task):
- Object Detection: 700
- Semantic Segmentation: 300
- Instance Segmentation: 400
- Panoptic Segmentation: 200
These limits are based on the RVC unified label space per task with some added slack and shall prevent cheating. The participant must publicly disclose their label space with their submission. The participant must upload a specification of their label space along with their submission to RVC. The participant is allowed to use simple post-processing scripts that project from their unified label space to a dataset-specific space for their submission to each specific leaderboard. But all such scripts must be made public, open-source, and submitted for inspection along with their submission to RVC. The scripts must operate on each class individually and cannot access input images or spatial locations of predictions (e.g., boxes, masks, or image coordinates). Operations such as non-maximum suppression are not allowed during dataset-specific post-processing (NMS is allowed during the dataset-agnostic generation of the unified prediction). The post-processing must operate on each class separately and can only map from a given class (or logits) in the unified label space to a class in a dataset-specific label space.
Please see the RVC dev kit for more details on how to participate at each task and help for training unification: https://github.com/ozendelait/rvc_devkit/tree/release
In general, approaches should be dataset agnostic and work well on unconstrained new data. Good: "Solve the task"; Bad: "Solve the dataset". In detail, some leeway has to be given to allow smooth training and the creation of valid submissions. Here is a summary of allowed and prohibited approaches:
- Benchmarking using individual versions/parameters/models per dataset to directly generate individual predictions
- Training individual separate solutions per dataset
- Designing a solution with the explicit number of datasets in mind (e.g. having the same number of encoder/decoders in parallel as the number of datasets) to guide the network towards the creation of internally separate solutions per dataset. Another invalid example is the design of parallel input layers based on the number of datasets.
- Using dataset-specific pre-processing during benchmarking
- Using the input frame or side channel information to post-process the unified label data into dataset-specific submissions during benchmarking. The postprocessing should only work with the intermediar "unified" result, a result which is valid and usable on its own, and transform it into a valid submission.
- Choosing individual sampling strategy per dataset during training
- Using your own unified label space
- Using dataset-specific pre- / post- processing steps to convert to a unified label space during training
- Using dataset-specific post processing steps during benchmarking
- Create/Keep logits as the unified result (while observing cardinality limits) and combine the logits before argmax during the dataset-specific post-processing