Name: Xiao Yao (PhD 1.5Y)

Eduction: 2013-2017 Tianjin U

2019-2022 Shanghai JiaoTong U

2023-Now SUTD

Supervisor: Lu wei, Li Xiaoli

Working Experience: Ximalaya, Meituan

Current interest: LLM reasoning、Alignment、MoE

Date: 0703

Decomposed Prompt Tuning via Low-Rank Reparameterization

Xiao Yao、Xu Lu、Li JiaXi、Lu Wei and Li XiaoLi

EMNLP 2023

Background

Update a small amount of parameters
Achieve comparable (even better) performance than full finetuning

✅ Less storage

✅ Less gpu memory

✅ Less Computational Overhead (Faster training speed)

Parameter Efficient Finetuning

Background

Prompt Tuning

PLM

x_1

x_2

x_3

x_7

x_8

...

Bert/T5/GPT

Trainable

Frozen

Soft Prompt e c

\times

Input n c

\times

Observation

U \in \mathbb{R}^{e \times e}, V \in \mathbb{R}^{c \times c} \\ \phantom{0000000} \Sigma \in \mathbb{R}^{e \times c} \phantom{0} \text{is diagonal matrix}

P = U\Sigma V

P = U Relu(\Sigma) V

The motivation of this initiation is that soft prompt flexibly adjust its rank.

Observation

Observation results are consistent.

Method

P = AB

A \in \mathbb{R}^{e \times b}, B \in \mathbb{R}^{b \times c}, b \ll min(e, c)

In practice, c is 100, e is 512, 768, and 1024 for T5-Small, T5-Base, and T5-Large respectively. We set b as 10.

ec >> eb + bc

Based on the observation above, we directly decompose the soft prompt into two low-rank trainable matrix.

Results

Model: T5-Small, T5-Base, T5-Large

Datasets: SuperGlue

Few-shot Results

8-shot、16-shot and 32-shot

WiC、CB、RTE、and COPA

Ours are consistently better.

Bottleneck

If we increase b to a large number, performance will be unstable.

Prompt Length

Ours can work even when the length of soft prompt is extremely short.

Same idea

DePT: Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning

UCL

ICLR 2024

Self-Adaptive In-Context Chains-of-Thought for Enhanced Mathematical Reasoning

Xiao Yao、Xu Lu、Li JiaXi、Lu Wei and Li XiaoLi

EMNLP 2024 under review

Motivation

Recently, researcher explored the self-improvement approaches of LLM. These work can be classified into two categories:

update the parameters of LLM: RFT、DPO
ICL without parameter update: use external feedback like compiler error or gold label

Can LLM self-improve without parameter update or external feedback?

Method

Intuition: Why It Works

Similar questions have similar solutions. Though generated solution may have mistakes, it can reason through the similar solutions to help solve the question.