Supervised Knowledge May Hurt Novel Class Discovery Performance
Ziyun Li 1, Jona Otholt 1, Ben Dai 2, Di hu 3,
Christoph Meinel 1, Haojin Yang 1
1 Hasso Plattner Institute, 2 Chinese University of Hong Kong, 3 Renmin University of China



NCD Background
Novel class discovery (NCD) is a machine learning task focused on finding new classes in the data that weren't available during the training period.
NCD Background

NCD Background

NCD Background

How can we borrow supervised knowledge and break the category constrain?
NCD Background

NCD: Existing Methods

Vaze et al (CVPR 2022) Generalized Category Discovery
NCD: Existing Methods
Fini et al (ICCV 2021) A Unified Objective for Novel Class Discovery

NCD: Existing Methods
Fini et al (ICCV 2021) A Unified Objective for Novel Class Discovery

NCD: Existing Methods
What makes the implementation of NCD possible?

NCD: Existing Methods
What makes the implementation of NCD possible?

Supervised info \( \mathbf{X} | Y \)
NCD: Existing Methods
What makes the implementation of NCD possible?

Supervised info \( \mathbf{X} | Y \)
Unsupervised info \( \mathbf{X} \)
NCD: Existing Methods

Vaze et al (CVPR 2022) Generalized Category Discovery
NCD: Existing Methods
Fini et al (ICCV 2021) A Unified Objective for Novel Class Discovery

NCD: Existing Methods
DL:
- More data is better...
- design a DL architecture
STAT:
- under this kind of assumption you should ...
NCD: Outline
DL:
- More data is better...
- design a DL architecture
STAT:
- under this kind of assumption you should ...
Motivated question:
Is supervised knowledge always helpful?
NCD: Outline
DL:
- More data is better...
- design a DL architecture
STAT:
- under this kind of assumption you should ...

Step 1
Step 2
Step 3
NCD: Metric


NCD: Metric


Suppose we learn a mapping \(\mathbf{p}\) from training samples
How to measure the effectiveness of \(\mathbf{p}\)
NCD: Metric


Recall: MMD
Muandet et al (2020) Kernel Mean Embedding of Distributions: A Review and Beyond
NCD: Metric


Recall: MMD
Muandet et al (2020) Kernel Mean Embedding of Distributions: A Review and Beyond

Fini et al (ICCV 2021) A Unified Objective for Novel Class Discovery
NCD: Metric

Yet, in practice, \(Y_u\) is unknown...

NCD: Benchmark

Step 1
Step 2
Step 3
NCD: Benchmark

NCD: Benchmark


NCD: Benchmark


Conclusion: consistency between Semantic Similarity and Accuracy. The proposed benchmark is good...
NCD: Benchmark



Conclusion: consistency among Semantic Similarity, Accuracy, and (pseudo) transfer flow. The proposed metric is good...
NCD: Supervised Info May Hurt

Step 1
Step 2
Step 3
Step 4
NCD: Supervised Info May Hurt

NCD: Supervised Info May Hurt


Suboptimal
NCD: Supervised Info May Hurt


Conclusion: Supervision information with low semantic relevance may hurt NCD performance.
NCD: Supervised Info May Hurt

Conclusion: pseudo transfer flow can be used as a practical reference to infer what sort of data we want to use in NCD.
Application: Data selection
NCD: Supervised Info May Hurt
Application: Data Combining

Contribution
- We find that using supervised knowledge from the labeled set may lead to suboptimal performance in low semantic NCD datasets. Based on this finding, we propose two practical methods and achieve ∼3% and ∼5% improvement in both CIFAR100 and ImageNet compared to SOTA.
- We introduce a theoretically reliable metric to measure the semantic similarity between labeled and unlabeled sets. A mutual validation is conducted between the proposed metric and a benchmark, which suggests that the proposed metric strongly agrees with NCD performance.
- We establish a comprehensive benchmark with varying degrees of difficulty based on ImageNet by leveraging its hierarchical semantic similarity.
Thank you!


ncd
By statmlben
ncd
- 21