Visual Perception via Learning in an Open World

The 5th workshop on Open World Vision

Location: Music City Center, Nashville TN

Time: 9:00am - 4:30pm Local Time, June 11/12, 2025

in conjunction with CVPR 2025, Nashville TN, USA

This page will be finalized; stay tuned!


Overview

Visual perception is crucial for a wide range of applications. Traditionally, visual perception models were developed under a closed-world paradigm, where data distributions and categorical labels were assumed to be fixed and known in advance. However, these closed-world models often prove brittle when deployed in the real open world, which is dynamic, vast, and unpredictable. Modern approaches to visual perception have shifted towards open-world models, such as pretraining foundation models on large datasets sourced from the open world (e.g., data collected from the Internet). These foundation models are then adapted to solve specific downstream tasks. While contemporary model training follows the principle of "open-world learning," our workshop seeks to address existing limitations, potential risks, new opportunities, and challenges. We invite researchers to participate in the Visual Perception and Learning in an Open World (VPLOW) workshop, where we will explore key topics outlined below.

You might be interested in our previous workshops at CVPR'24, CVPR'23, CVPR'22, CVPR'21, etc.


Topics

Topics of interest include, but are not limited to:

  • data: long-tailed distribution, open-set, unknowns, streaming data, biased data, unlabeled data, anomaly, multi-modality, etc.
  • concepts: open-vocabulary, ontology/taxonomy of object classes, evolving class ontology, etc.
  • learning: X-shot learning, Y-supervised learning, lifelong/continual learning, domain adaptation/generalization, open-world learning, multimodal pretraining, prompt learning, foundation model tuning, etc.
  • social impact: safety, fairness, real-world applications, inter-disciplinary research, etc.
  • misc: datasets, benchmarks, interpretability, robustness, generalization, etc.

Examples

Let's consider the following motivational examples.

  • Open-world data follows long-tail distributions, where real-world tasks often emphasize the rare observations. This mismatch between data distribution and task demands necessitates careful development of perception models. Visual perception algorithms can struggle to recognize rare examples, leading to serious consequences. For example, a visual recognition model can misclassify underrepresented minorities and make unethical predictions (ref. case1, case2). Even foundation models also suffer from long-tailed distributions of pretraining data (ref. link).
  • The open-world presents unknown examples, Largely due to the long-tail nature of data distribution, largely due to the long-tail nature of the data distribution. Visual perception models operating in the open world are invariably confronted with unfamiliar instances. Failure to detect these unknowns can lead to serious consequences. For example, a Tesla Model 3 failed to recognize an overturned truck as an unknown object, resulting in a crash (ref. case).
  • The open world requires aligning ontologies. To define a domain of interest, such as autonomous driving or the fashion industry, we typically establish an ontology (e.g., class labels or fashion styles). However, in the real open world, these ontologies constantly evolve (ref. link), leading to a mismatch between the concepts defined for a task and those understood by pretrained (foundation) models (ref. link). For example, in the fashion industry, the fashion vocabulary must expand continuously as trends and styles change with each season. This raises a key question: how can we effectively train a model to align the constantly shifting ontolgoies or those understood by an "outdated" foundation model? The world of interest is changing over time, e.g., driving scenes (in different cities and under different weather), the search engine ("apple" means different things today and 20 years ago). This says that the data distribution and semantics are continually changing and evolving. How to address distribution shifts and concept drifts?



Organizers

Please contact Shu Kong with any questions: aimerykong [at] gmail [dot] com


Shu Kong
UMacau

Yu-Xiong Wang
University of Illinois at Urbana-Champaign
Andrew Owens
University of Michigan

Abhinav Shrivastava
University of Maryland


Advisory Board

Deva Ramanan
Carnegie Mellon University

Terrance Boult
University of Colorado Colorado Springs
Walter J. Scheirer
University of Notre Dame



Challenge Organizers


Shu Kong
UMacau

Qianqian Shen
Zhejiang University

Yunhan Zhao
Google / UCI


Coordinators


Tian Liu
Texas A&M

Yunhan Zhao
Google / UCI




Important Dates and Details


Competition

We organize four challenge competitions this year.


Program Schedule

CDT
Event
Title/Presenter
Links
08:30 - 08:50
Opening remarks
Shu Kong University of Macau
Visual Perception via Learning in an Open World
08:50 - 09:30
Invited talk #1
TBA, tba
tba
09:30 - 10:10
Invited talk #2
TBA, tba
tba
10:10 - 10:15
Coffee break
10:15 - 10:55
Invited talk #3
TBA, tba
tba
10:55 - 11:35
Challenge-1
Challenge 1: InsDet
Object Instance Detection Challenge
11:35 - 13:30
Lunch
13:30 - 14:10
Invited talk #4
TBA, tba
tba
14:10 - 14:50
Invited talk #5
TBA, tba
tba
14:50 - 15:30
Invited talk #6
TBA, tba
tba
15:30 - 15:35
Coffee break
15:35 - 16:15
Challenge-2
Challenge-2: Foundational FSOD
Foundational Few-Shot Object Detection Challenge v2
16:15 - 16:55
Invited talk #7
tba
tba
16:55 - 17:35
Invited talk #8
tba
tba
17:35 - 17:40
Closing remarks