Supervised Knowledge May Hurt Novel Class Discovery Performance

 

Ziyun Li 1, Jona Otholt 1, Ben Dai 2, Di hu 3,
Christoph  Meinel 1, Haojin Yang 1

1 Hasso Plattner Institute, 2 Chinese University of Hong Kong, 3 Renmin University of China  

 

 

 

NCD Background

Novel class discovery (NCD) is a machine learning task focused on finding new classes in the data that weren't available during the training period.

NCD Background

NCD Background

NCD Background

How can we borrow supervised knowledge and break the category constrain?

NCD Background

NCD: Existing Methods

Vaze et al (CVPR 2022) Generalized Category Discovery

NCD: Existing Methods

Fini et al (ICCV 2021) A Unified Objective for Novel Class Discovery

NCD: Existing Methods

Fini et al (ICCV 2021) A Unified Objective for Novel Class Discovery

NCD: Existing Methods

What makes the implementation of NCD possible?

NCD: Existing Methods

What makes the implementation of NCD possible?

Supervised info \( \mathbf{X} | Y \)

NCD: Existing Methods

What makes the implementation of NCD possible?

Supervised info \( \mathbf{X} | Y \)

Unsupervised info \( \mathbf{X} \)

NCD: Existing Methods

Vaze et al (CVPR 2022) Generalized Category Discovery

NCD: Existing Methods

Fini et al (ICCV 2021) A Unified Objective for Novel Class Discovery

NCD: Existing Methods

DL:

  • More data is better...
  • design a DL architecture

STAT:

  • under this kind of assumption you should ...

NCD: Outline

DL:

  • More data is better...
  • design a DL architecture

STAT:

  • under this kind of assumption you should ...

Motivated question:
Is supervised knowledge always helpful?

NCD: Outline

DL:

  • More data is better...
  • design a DL architecture

STAT:

  • under this kind of assumption you should ...

Step 1

Step 2

Step 3

NCD: Metric

NCD: Metric

Suppose we learn a mapping \(\mathbf{p}\) from training samples
How to measure the effectiveness of \(\mathbf{p}\)

NCD: Metric

Recall: MMD

Muandet et al (2020) Kernel Mean Embedding of Distributions: A Review and Beyond

NCD: Metric

Recall: MMD

Muandet et al (2020) Kernel Mean Embedding of Distributions: A Review and Beyond

Fini et al (ICCV 2021) A Unified Objective for Novel Class Discovery

NCD: Metric

Yet, in practice, \(Y_u\) is unknown...

NCD: Benchmark

Step 1

Step 2

Step 3

NCD: Benchmark

NCD: Benchmark

NCD: Benchmark

Conclusion: consistency between Semantic Similarity and Accuracy. The proposed benchmark is good...

NCD: Benchmark

Conclusion: consistency among Semantic Similarity, Accuracy, and  (pseudo) transfer flow. The proposed metric is good...

NCD: Supervised Info May Hurt

Step 1

Step 2

Step 3

Step 4

NCD: Supervised Info May Hurt

NCD: Supervised Info May Hurt

Suboptimal

NCD: Supervised Info May Hurt

Conclusion: Supervision information with low semantic relevance may hurt NCD performance.

NCD: Supervised Info May Hurt

Conclusion: pseudo transfer flow can be used as a practical reference to infer what sort of data we want to use in NCD.

Application: Data selection

NCD: Supervised Info May Hurt

Application: Data Combining

Contribution

  • We find that using supervised knowledge from the labeled set may lead to suboptimal performance in low semantic NCD datasets. Based on this finding, we propose two practical methods and achieve ∼3% and ∼5% improvement in both CIFAR100 and ImageNet compared to SOTA.
  • We introduce a theoretically reliable metric to measure the semantic similarity between labeled and unlabeled sets. A mutual validation is conducted between the proposed metric and a benchmark, which suggests that the proposed metric strongly agrees with NCD performance.
  • We establish a comprehensive benchmark with varying degrees of difficulty based on ImageNet by leveraging its hierarchical semantic similarity.

Thank you!

ncd

By statmlben