Visual Perception via Learning in an Open World

The 5th workshop on Open World Vision See you there soon!

Location: 104 C, Music City Center, Nashville TN

Time: 9:00am-5:00pm Local Time (CDT), June 11, 2025

in conjunction with CVPR 2025, Nashville TN, USA

Overview

Visual perception is crucial for a wide range of applications. Traditionally, visual perception models were developed under a closed-world paradigm, where data distributions and categorical labels were assumed to be fixed and known in advance. However, these closed-world models often prove brittle when deployed in the real open world, which is dynamic, vast, and unpredictable. Modern approaches to visual perception have shifted towards open-world models, such as pretraining foundation models on large datasets sourced from the open world (e.g., data collected from the Internet). These foundation models are then adapted to solve specific downstream tasks. While contemporary model training follows the principle of "open-world learning," our workshop seeks to address existing limitations, potential risks, new opportunities, and challenges. We invite researchers to participate in the Visual Perception and Learning in an Open World (VPLOW) workshop, where we will explore key topics outlined below.

You might be interested in our previous workshops at CVPR'24, CVPR'23, CVPR'22, CVPR'21.

Topics

Topics of interest include, but are not limited to:

data: long-tailed distribution, open-set, unknowns, streaming data, biased data, unlabeled data, anomaly, multi-modality, etc.
concepts: open-vocabulary, ontology/taxonomy of object classes, evolving class ontology, etc.
learning: X-shot learning, Y-supervised learning, lifelong/continual learning, domain adaptation/generalization, open-world learning, multimodal pretraining, prompt learning, foundation model tuning, etc.
social impact: safety, fairness, real-world applications, inter-disciplinary research, etc.
misc: datasets, benchmarks, interpretability, robustness, generalization, etc.

Examples

Let's consider the following motivational examples.

Open-world data follows long-tail distributions, where real-world tasks often emphasize the rare observations. This mismatch between data distribution and task demands necessitates careful development of perception models. Visual perception algorithms can struggle to recognize rare examples, leading to serious consequences. For example, a visual recognition model can misclassify underrepresented minorities and make unethical predictions (ref. case1, case2). Even foundation models also suffer from long-tailed distributions of pretraining data (ref. link).
The open-world presents unknown examples, Largely due to the long-tail nature of data distribution, largely due to the long-tail nature of the data distribution. Visual perception models operating in the open world are invariably confronted with unfamiliar instances. Failure to detect these unknowns can lead to serious consequences. For example, a Tesla Model 3 failed to recognize an overturned truck as an unknown object, resulting in a crash (ref. case).
The open world requires aligning ontologies. To define a domain of interest, such as autonomous driving or the fashion industry, we typically establish an ontology (e.g., class labels or fashion styles). However, in the real open world, these ontologies constantly evolve (ref. link), leading to a mismatch between the concepts defined for a task and those understood by pretrained (foundation) models (ref. link). For example, in the fashion industry, the fashion vocabulary must expand continuously as trends and styles change with each season. This raises a key question: how can we effectively train a model to align the constantly shifting ontolgoies or those understood by an "outdated" foundation model? The world of interest is changing over time, e.g., driving scenes (in different cities and under different weather), the search engine ("apple" means different things today and 20 years ago). This says that the data distribution and semantics are continually changing and evolving. How to address distribution shifts and concept drifts?