Open-World Vision

The 6th workshop of Open World Vision       See you there soon!

Location: tba, Denver CO

Time: 8:45am-12:15pm Local Time, June XX, 2026

in conjunction with CVPR 2026, Denver CO, USA


Overview

Open-World Vision (OWV) emphasizes realistic opportunities and challenges in developing and deploying computer vision systems in the dynamic, vast, and unpredictable real open world, which offers abundant data that can benefit training and challenge testing. It contrasts the traditional "closed-world" paradigm of visual learning and inference, which assumes fixed, known data distributions and categorical labels. Models developed under such closed-world assumptions tend to be brittle when encountering ever-changing and novel scenarios in the real open world. Modern visual learning has shifted towards an open-world paradigm, such as pretraining foundation models on massive data sourced from the open world (e.g., web-sourced data). While these models show unprecedented performance and strong adaptability to downstream tasks, they inherit biases from their open-world pretraining data and can still fail in truly novel or underrepresented scenarios during deployment. This workshop aims not only to uncover current limitations, potential risks, emerging opportunities, and unresolved challenges of open-world vision, but also to solicit solutions that advance the field toward more robust, fair, and adaptable visual systems.

You might be interested in our previous workshops at CVPR'25, CVPR'24, CVPR'23, CVPR'22, CVPR'21.


Topics

Topics of interest include, but are not limited to:

  • open-world research: robotics, visual generation, etc.
  • open-world pretraining: foundational vision models, vision-language models, multimodal language models, etc.
  • open-world data: long-tailed distribution, open-set, unknowns, streaming data, biased data, unlabeled data, anomaly, multi-modality, etc.
  • open-world concepts: open-vocabulary, ontology/taxonomy of object classes, evolving class ontology, etc.
  • open-world adaptation: X-shot learning, Y-supervised learning, lifelong/continual learning, domain adaptation/generalization, open-world learning, etc.
  • open-world social impact: safety, fairness, inter-disciplinary research, etc.

Examples

Let's consider the following motivational examples.

  • Open-world data follows long-tail distributions, where real-world tasks often emphasize the rare observations. This mismatch between data distribution and task demands necessitates careful development of perception models. Visual perception algorithms can struggle to recognize rare examples, leading to serious consequences. For example, a visual recognition model can misclassify underrepresented minorities and make unethical predictions (ref. case1, case2). Even foundation models also suffer from long-tailed distributions of pretraining data (ref. link).
  • The open-world presents unknown examples, Largely due to the long-tail nature of data distribution, largely due to the long-tail nature of the data distribution. Visual perception models operating in the open world are invariably confronted with unfamiliar instances. Failure to detect these unknowns can lead to serious consequences. For example, a Tesla Model 3 failed to recognize an overturned truck as an unknown object, resulting in a crash (ref. case).
  • The open world requires aligning ontologies. To define a domain of interest, such as autonomous driving or the fashion industry, we typically establish an ontology (e.g., class labels or fashion styles). However, in the real open world, these ontologies constantly evolve (ref. link), leading to a mismatch between the concepts defined for a task and those understood by pretrained (foundation) models (ref. link). For example, in the fashion industry, the fashion vocabulary must expand continuously as trends and styles change with each season. This raises a key question: how can we effectively train a model to align the constantly shifting ontolgoies or those understood by an "outdated" foundation model? The world of interest is changing over time, e.g., driving scenes (in different cities and under different weather), the search engine ("apple" means different things today and 20 years ago). This says that the data distribution and semantics are continually changing and evolving. How to address distribution shifts and concept drifts?

Speakers


Shu Kong
UMacau

Marc Pollefeys
ETH Zurich, Microsoft

Jiajun Wu
Stanford

Carl Vondrick
Columbia University

Shuran Song
Stanford



Organizers

Please contact Shu Kong with any questions: aimerykong [at] gmail [dot] com


Shu Kong
UMacau

Abhinav Shrivastava
University of Maryland

Andrew Owens
Cornell Tech

Yunhan Zhao
Google Deepmind / UCI

Tian Liu
Texas A&M


Advisory Board

Deva Ramanan
Carnegie Mellon University

Terrance Boult
University of Colorado Colorado Springs
Walter J. Scheirer
University of Notre Dame



Challenge Organizers


Shu Kong
UMacau




Important Dates and Details


Competition

We organize four challenge competitions this year.


Program Schedule

This section is under construction!!!!

CDT
Event
Presenter / Title
Links
09:00 - 09:20
Opening remarks
Shu Kong University of Macau
Visual Perception via Learning in an Open World
09:20 - 10:00
Invited talk #1
Kristen Grauman, UT Austin
Human activity in the open world
10:00 - 10:40
Invited talk #2
Gunshi Gupta, Yarin Gal, University of Oxford
tba
10:40 - 11:20
Invited talk #3
Grant Van Horn, UMass-Amherst
Merlin Sound ID
11:20 - 12:00
Invited talk #4
Abhinav Gupta, CMU
Scaling Robotics via Open World Visual Learning
12:00 - 13:00
Lunch
13:00 - 13:40
Invited talk #5
Yuxiong Wang, UIUC
Putting Context First in Open-World Perception
13:40 - 14:20
Invited talk #6
Deepak Pathak, CMU
Learning to Reason via RL In the Open World
14:20 - 15:00
Invited talk #7
Liangyan Gui, UIUC
Animating Human-Object Interactions in the Wild
15:00 - 15:05
Coffee break
15:05 - 15:45
Challenge-1
Challenge 1: InsDet
Object Instance Detection Challenge
15:45 - 16:25
Challenge-2
Challenge-2: Foundational FSOD
Foundational Few-Shot Object Detection Challenge v2
16:25 - 16:30
Closing remarks