[논문 리뷰 #19] Dual Prototype Evolving for Test-Time Generalization of Vision-Language Models, NeurIPS 2024

Dual Prototype Evolving for Test-Time Generalization of...

Test-time adaptation, which enables models to generalize to diverse data with unlabeled test samples, holds significant value in real-world scenarios. Recently, researchers have applied this...

openreview.net

1. 배경

1) Test-time prompt tuning 방식은 Text ecoder를 통과해서 Gradient 계산을 해야하기 때문에 계산비용, 메모리에서 손해

2) 기존 VLM adaptaion 방식의 한계

- 과거 sample에 대한 정보를 축적해 나가지 않음

- 이전 연구는 Text, Vision 중 하나의 modality 만 활용하여 adaptation을 진행함

2. 방법

(1) Prototype Evolving

* Text prototype Evolving

1) N개의 Class descriptions를 생성하고, Text Encoder에 입력, Text embedding을 평균내어 Prototype 생성

2) Prototype update usin cumulative average

- 과거 samples에 대한 정보를 계속해서 update 하기 위한 방법

t : online updated prototype

t* : optimized textual prototypes

* Visual prototype Evolving

1) Cache를 생성하고 Class당 n 개씩 Image feature를 저장함

- 이때 GT를 활용할 수 없으므로 Image feature에 대한 예측값을 Pseudo label로 Image feature와 pair로 저장

2) 한 Class의 Cache가 꽉차 있다면 Entropy가 낮은 것은 입력, Entropy가 높은 것은 Cache에서 삭제함

- 과거 samples에 대한 정보를 계속해서 update 하기 위한 방법

3) Visual prototype은 Image feature의 평균임

(2) Prototype-Based Inference

- Text, Visual Prototype 각각에 대한 logit을 합하여 최종 Logit 출력

- Visual logit은 affinity function을 통과 시킨 후 더해줌

v : visual prototype

t : text prototype

(3) Prototype Residual Learning

1) Residual Parameter

- Prototype + Residual parameter 후 L2 Norm

- 이때 Residual parameter는 0으로 초기화 되어 있으며, 이른 통해 Prototype을 안정적으로 학습시킬 수 있음

2) Entropy Minimization

- 최종 Logit에 대한 Entropy 최소화

3) InfoNCE loss

- Entropy loss만 사용한다면 모델은 overconfidence한 예측을 내도록 adaptation됨

3. 실험 결과

(1) 성능 비교

(2) 효율성 비교

4. 요약

- 기존 VLM adaptation 방식은 과거 sample에 대한 정보를 축적해 나가지 않으며 Text, Vision 중 하나의 modality 만 활용하여 adaptation을 진행.

- 이를 해결하기 위해 Text, Visual prototype을 동시 활용하며, Residual parameter와 결합하여 각 sample에 대한 Adaptation 진행

- Text prototype은 cumulative average, Visual Prototype은 cache의 image feature를 update 하며 과거 Sample에 대한 Adaptation 정보 누적

'논문 리뷰' 카테고리의 다른 글

[논문 리뷰 #20] What does a platypus look like? Generating customized prompts for zero-shot image classification, ICCV 2023 (0)	2025.05.14
[논문 리뷰 #18] NOTE: Robust Continual Test-time Adaptation Against Temporal Correlation, NeurIPS 2022 (0)	2025.01.22
[논문 리뷰 #17] Grounded Language-Image Pre-training, CVPR 2022 (0)	2024.12.11
[논문 리뷰 #16] A Broad Study of Pre-training for Domain Generalization and Adaptation, ECCV 2022 (0)	2024.12.08
[논문 리뷰 #15] UDA-Bench: Revisiting Common Assumptions in Unsupervised Domain Adaptation Using a Standardized Framework, ECCV 2024 (0)	2024.12.07

머신러닝 연구

[논문 리뷰 #19] Dual Prototype Evolving for Test-Time Generalization of Vision-Language Models, NeurIPS 2024

1. 배경