A Simple But Tough-to-Beat Baseline for Sentence embeddings

Sanjeev Arora, Yingyu Liang, Tengyu Ma

v_{s}\leftarrow \frac{1}{\left | s \right |}\sum_{w\in s} \frac{a}{a + p(w)} v_{w}
vs1swsaa+p(w)vwv_{s}\leftarrow \frac{1}{\left | s \right |}\sum_{w\in s} \frac{a}{a + p(w)} v_{w}

Probability word in corpus

 

Word2Vec embedding

 

Sentence embedding

p(w)
p(w)p(w)
v_w
vwv_w
v_s
vsv_s
v_{s}\leftarrow \frac{1}{\left | s \right |}\sum_{w\in s} \frac{a}{a + \color{red}{p(w)}} v_{w}
vs1swsaa+p(w)vwv_{s}\leftarrow \frac{1}{\left | s \right |}\sum_{w\in s} \frac{a}{a + \color{red}{p(w)}} v_{w}

罕见单词高权重

v_s\leftarrow v_s - uu^Tv_s
vsvsuuTvsv_s\leftarrow v_s - uu^Tv_s

First singular vector of

主成分 PCA / SVD

Word2Vec embedding

 

Sentence embedding

u
uu
v_w
vwv_w
v_s
vsv_s
v_s
vsv_s
v_s\leftarrow v_s - \color{red}{uu^Tv_s}
vsvsuuTvsv_s\leftarrow v_s - \color{red}{uu^Tv_s}

去除共性

\frac{a}{a + \color{red}{p(w)}}
aa+p(w)\frac{a}{a + \color{red}{p(w)}}

Title Text

PSL: PARAGRAM-SL999 vectors (Wieting et al., 2015)

PP: PHRASE-EMBEDDING (Wieting et al., 2016)

Made with Slides.com