Open-World Vision

The 6th workshop of Open World Vision See you there soon!

Location: tba, Denver CO

Time: 8:45am-12:15pm Local Time, June XX, 2026

in conjunction with CVPR 2026, Denver CO, USA

Overview

Open-World Vision (OWV) emphasizes realistic opportunities and challenges in developing and deploying computer vision systems in the dynamic, vast, and unpredictable real open world, which offers abundant data that can benefit training and challenge testing. It contrasts the traditional "closed-world" paradigm of visual learning and inference, which assumes fixed, known data distributions and categorical labels. Models developed under such closed-world assumptions tend to be brittle when encountering ever-changing and novel scenarios in the real open world. Modern visual learning has shifted towards an open-world paradigm, such as pretraining foundation models on massive data sourced from the open world (e.g., web-sourced data). While these models show unprecedented performance and strong adaptability to downstream tasks, they inherit biases from their open-world pretraining data and can still fail in truly novel or underrepresented scenarios during deployment. This workshop aims not only to uncover current limitations, potential risks, emerging opportunities, and unresolved challenges of open-world vision, but also to solicit solutions that advance the field toward more robust, fair, and adaptable visual systems.

You might be interested in our previous workshops at CVPR'25, CVPR'24, CVPR'23, CVPR'22, CVPR'21.

Topics

Topics of interest include, but are not limited to:

open-world research: robotics, visual generation, etc.
open-world pretraining: foundational vision models, vision-language models, multimodal language models, etc.
open-world data: long-tailed distribution, open-set, unknowns, streaming data, biased data, unlabeled data, anomaly, multi-modality, etc.
open-world concepts: open-vocabulary, ontology/taxonomy of object classes, evolving class ontology, etc.
open-world adaptation: X-shot learning, Y-supervised learning, lifelong/continual learning, domain adaptation/generalization, open-world learning, etc.
open-world social impact: safety, fairness, inter-disciplinary research, etc.

Examples

Let's consider the following motivational examples.

Open-world data follows long-tail distributions, where real-world tasks often emphasize the rare observations. This mismatch between data distribution and task demands necessitates careful development of perception models. Visual perception algorithms can struggle to recognize rare examples, leading to serious consequences. For example, a visual recognition model can misclassify underrepresented minorities and make unethical predictions (ref. case1, case2). Even foundation models also suffer from long-tailed distributions of pretraining data (ref. link).
The open-world presents unknown examples, Largely due to the long-tail nature of data distribution, largely due to the long-tail nature of the data distribution. Visual perception models operating in the open world are invariably confronted with unfamiliar instances. Failure to detect these unknowns can lead to serious consequences. For example, a Tesla Model 3 failed to recognize an overturned truck as an unknown object, resulting in a crash (ref. case).
The open world requires aligning ontologies. To define a domain of interest, such as autonomous driving or the fashion industry, we typically establish an ontology (e.g., class labels or fashion styles). However, in the real open world, these ontologies constantly evolve (ref. link), leading to a mismatch between the concepts defined for a task and those understood by pretrained (foundation) models (ref. link). For example, in the fashion industry, the fashion vocabulary must expand continuously as trends and styles change with each season. This raises a key question: how can we effectively train a model to align the constantly shifting ontolgoies or those understood by an "outdated" foundation model? The world of interest is changing over time, e.g., driving scenes (in different cities and under different weather), the search engine ("apple" means different things today and 20 years ago). This says that the data distribution and semantics are continually changing and evolving. How to address distribution shifts and concept drifts?