Visual Perception via Learning in an Open World

The 4th workshop on Open World Vision

Location: Summit 328, Seattle Convention Center

Time: 8:30am - 5:30pm Local Time (PDT), Tuesday, June 18, 2024

in conjunction with CVPR 2024, Seattle, US

The recording will be released after CVPR'24 conference by policy.


Visual perception is indispensable for numerous applications, spanning transportation, healthcare, security, commerce, entertainment, and interdisciplinary research. Visual perception algorithms developed in a closed-world setup often generalize poorly to the real open-world, which contains situations that are never-before-seen, dynamic, vast, and unpredictable. This requires visual perception algorithms to be developed for the open-world, to address its complexities such as recognizing unknown objects, debiasing imbalanced data distributions, leveraging multimodal signals, efficient few-shot learning, etc. Moreover, today's most powerful visual perception models are pretrained in an open world, e.g., training them on web-scale data consisting of images, langauges and so on. We are in the best era to study Visual Perception via Learning in an Open World (VPLOW). Therefore, we are inviting you to our VPLOW workshop, where multiple speakers and challenge competitions will cover a variety of topics of VPLOW. We hope our workshop stimulates fruitful discussions.

You might be interested in our previous workshops at CVPR'23, CVPR'22, CVPR'21, etc.


Topics of interest include, but are not limited to:

  • data: long-tailed distribution, open-set, unknowns, streaming data, biased data, unlabeled data, anomaly, multi-modality, etc.
  • concepts: open-vocabulary, ontology/taxonomy of object classes, evolving class ontology, etc.
  • learning: X-shot learning, Y-supervised learning, lifelong/continual learning, domain adaptation/generalization, open-world learning, multimodal pretraining, prompt learning, foundation model tuning, etc.
  • social impact: safety, fairness, real-world applications, inter-disciplinary research, etc.
  • misc: datasets, benchmarks, interpretability, robustness, generalization, etc.


Let's consider the following motivational examples.

  • Open-world data follows a long-tail distribution. Data tends to follow a long-tailed distribution and real-world tasks often emphasize the rarely-seen data. A model trained on such long-tailed data can perform poorly on rare or underrepresentative data. For example, a visual recognition model can misclassify underrepresented minorities and make unethical predictions (ref. case1, case2).
  • Open-world contains unknown examples. Largely due to the long-tail nature of data distribution, visual perception models are invariably confronted by unknown examples in the open world. Failing to detecting the unknowns can cause serious issues. For example, a Tesla Model 3 did not identify an unknown overturned truck and crashed into the truck (ref. case).
  • Open-world requires learning with evolving data, and labels. The world of interest is changing over time, e.g., driving scenes (in different cities and under different weather), the search engine ("apple" means different things today and 20 years ago). This says that the data distribution and semantics are continually changing and evolving. How to address distribution shifts and concept drifts?


Shu Kong
UMacau, Texas A&M

Deva Ramanan
Carnegie Mellon University

Walter J. Scheirer
University of Notre Dame

Ziwei Liu
Nanyang Technological University
Xiaolong Wang
UC San Diego

Andrew Owens
University of Michigan

Yu-Xiong Wang
University of Illinois at Urbana-Champaign


Please contact Shu Kong with any questions: aimerykong [at] gmail [dot] com

Shu Kong
UMacau, Texas A&M

Yanan Li
Zhejiang Lab

Yu-Xiong Wang
University of Illinois at Urbana-Champaign
Andrew Owens
University of Michigan

Deepak Pathak
Carnegie Mellon University

Carl Vondrick
Columbia University

Abhinav Shrivastava
University of Maryland

Advisory Board

Deva Ramanan
Carnegie Mellon University

Terrance Boult
University of Colorado Colorado Springs
Walter J. Scheirer
University of Notre Dame

Challenge Organizers

Shu Kong
UMacau, Texas A&M

Yanan Li
Zhejiang Lab

Qianqian Shen
Zhejiang University

Yunhan Zhao
UC Irvine

Xiaoyu Yue
University of Sydney

Wenwei Zhang
Shanghai AI Lab

Jiangmiao Pang
Shanghai AI Lab

Pan Zhang
Shanghai AI Lab

Xiaoyi Dong
Shanghai AI Lab

Yuhang Zang
Shanghai AI Lab

Jiaqi Wang
Shanghai AI Lab


Tian Liu
Texas A&M

Yunhan Zhao
UC Irvine

Important Dates and Details

Program Schedule

PDT / Time in Vancouver
08:30 - 08:50
Opening remarks
Shu Kong Texas A&M, University of Macau
Visual Perception via Learning in an Open World
08:50 - 09:30
Invited talk #1
Walter Scheirer, University of Notre Dame
Open Issues in Open World Learning
09:30 - 10:10
Invited talk #2
Deva Ramanan CMU
Open World Learning in the Era of MultiModal Foundation Models
10:10 - 10:15
Coffee break
10:15 - 10:55
Invited talk #3
Andrew Owens, UMich
Learning Multimodal Models of the Physical World
10:55 - 11:35
Challenge 1: InsDet
Object Instance Detection Challenge
11:35 - 13:30
13:30 - 14:10
Invited talk #4
Xiaolong Wang UCSD
Spatial Perception and Control in the Wild
14:10 - 14:50
Invited talk #5
Ziwei Liu NTU
Building Open-World Multimodal AI Assistant
14:50 - 15:30
Invited talk #6
Yu-Xiong Wang UIUC
All-in-One: Bridging Generative and Discriminative Learning in the Open World
15:30 - 15:35
Coffee break
15:35 - 16:15
Challenge-2: Foundational FSOD
Foundational Few-Shot Object Detection Challenge
16:15 - 16:55
Challenge-3: OV-PARTS
Challenge of Open-Vocabulary Part Segmentation
16:55 - 17:35
Challenge-4: V3Det
Challenge of Vast Vocabulary Visual Detection
17:35 - 17:40
Closing remarks