Visual Perception via Learning in an Open World

The 5th workshop on Open World Vision

Location: Music City Center, Nashville TN

Time: 9:00am - 4:30pm Local Time, June 11/12, 2025

in conjunction with CVPR 2025, Nashville TN, USA


Overview

Visual perception is crucial for a wide range of applications. Traditionally, visual perception models were developed under a closed-world paradigm, where data distributions and categorical labels were assumed to be fixed and known in advance. However, these closed-world models often prove brittle when deployed in the real open world, which is dynamic, vast, and unpredictable. Modern approaches to visual perception have shifted towards open-world models, such as pretraining foundation models on large datasets sourced from the open world (e.g., data collected from the Internet). These foundation models are then adapted to solve specific downstream tasks. While contemporary model training follows the principle of "open-world learning," our workshop seeks to address existing limitations, potential risks, new opportunities, and challenges. We invite researchers to participate in the Visual Perception and Learning in an Open World (VPLOW) workshop, where we will explore key topics outlined below.

You might be interested in our previous workshops at CVPR'24, CVPR'23, CVPR'22, CVPR'21, etc.


Topics

Topics of interest include, but are not limited to:

  • data: long-tailed distribution, open-set, unknowns, streaming data, biased data, unlabeled data, anomaly, multi-modality, etc.
  • concepts: open-vocabulary, ontology/taxonomy of object classes, evolving class ontology, etc.
  • learning: X-shot learning, Y-supervised learning, lifelong/continual learning, domain adaptation/generalization, open-world learning, multimodal pretraining, prompt learning, foundation model tuning, etc.
  • social impact: safety, fairness, real-world applications, inter-disciplinary research, etc.
  • misc: datasets, benchmarks, interpretability, robustness, generalization, etc.

Examples

Let's consider the following motivational examples.

  • Open-world data follows long-tail distributions, where real-world tasks often emphasize the rare observations. This mismatch between data distribution and task demands necessitates careful development of perception models. Visual perception algorithms can struggle to recognize rare examples, leading to serious consequences. For example, a visual recognition model can misclassify underrepresented minorities and make unethical predictions (ref. case1, case2). Even foundation models also suffer from long-tailed distributions of pretraining data (ref. link).
  • The open-world presents unknown examples, Largely due to the long-tail nature of data distribution, largely due to the long-tail nature of the data distribution. Visual perception models operating in the open world are invariably confronted with unfamiliar instances. Failure to detect these unknowns can lead to serious consequences. For example, a Tesla Model 3 failed to recognize an overturned truck as an unknown object, resulting in a crash (ref. case).
  • The open world requires aligning ontologies. To define a domain of interest, such as autonomous driving or the fashion industry, we typically establish an ontology (e.g., class labels or fashion styles). However, in the real open world, these ontologies constantly evolve (ref. link), leading to a mismatch between the concepts defined for a task and those understood by pretrained (foundation) models (ref. link). For example, in the fashion industry, the fashion vocabulary must expand continuously as trends and styles change with each season. This raises a key question: how can we effectively train a model to align the constantly shifting ontolgoies or those understood by an "outdated" foundation model? The world of interest is changing over time, e.g., driving scenes (in different cities and under different weather), the search engine ("apple" means different things today and 20 years ago). This says that the data distribution and semantics are continually changing and evolving. How to address distribution shifts and concept drifts?



Organizers

Please contact Shu Kong with any questions: aimerykong [at] gmail [dot] com


Shu Kong
UMacau

Yu-Xiong Wang
University of Illinois at Urbana-Champaign
Andrew Owens
University of Michigan

Abhinav Shrivastava
University of Maryland


Advisory Board

Deva Ramanan
Carnegie Mellon University

Terrance Boult
University of Colorado Colorado Springs
Walter J. Scheirer
University of Notre Dame



Challenge Organizers


Shu Kong
UMacau

Qianqian Shen
Zhejiang University


Coordinators


Tian Liu
Texas A&M

Yunhan Zhao
UC Irvine




Important Dates and Details


Competition

We organize four challenge competitions this year.


Program Schedule

PDT / Time in Vancouver
Event
Title/Presenter
Links
08:30 - 08:50
Opening remarks
Shu Kong Texas A&M, University of Macau
Visual Perception via Learning in an Open World
08:50 - 09:30
Invited talk #1
Walter Scheirer, University of Notre Dame
Open Issues in Open World Learning
09:30 - 10:10
Invited talk #2
Deva Ramanan CMU
Open World Learning in the Era of MultiModal Foundation Models
10:10 - 10:15
Coffee break
10:15 - 10:55
Invited talk #3
Andrew Owens, UMich
Learning Multimodal Models of the Physical World
10:55 - 11:35
Challenge-1
Challenge 1: InsDet
Object Instance Detection Challenge
11:35 - 13:30
Lunch
13:30 - 14:10
Invited talk #4
Xiaolong Wang UCSD
Spatial Perception and Control in the Wild
14:10 - 14:50
Invited talk #5
Ziwei Liu NTU
Building Open-World Multimodal AI Assistant
14:50 - 15:30
Invited talk #6
Yu-Xiong Wang UIUC
All-in-One: Bridging Generative and Discriminative Learning in the Open World
15:30 - 15:35
Coffee break
15:35 - 16:15
Challenge-2
Challenge-2: Foundational FSOD
Foundational Few-Shot Object Detection Challenge
16:15 - 16:55
Challenge-3
Challenge-3: OV-PARTS
Challenge of Open-Vocabulary Part Segmentation
16:55 - 17:35
Challenge-4
Challenge-4: V3Det
Challenge of Vast Vocabulary Visual Detection
17:35 - 17:40
Closing remarks