ads recommendation in a collapsed and entangled world

Junwei Pan, Wei Xue, Ximei Wang, Haibin Yu, Xun Liu, Shijie Quan, Xueming Qiu, Dapeng Liu, Lei Xiao, Jie Jiang

tencent ads @kdd '24

yuan meng

november 22, 2024

ml journal club @doordash

overview: deep learning recommender systems

feature extractions/encoding
- sequence features: lookup + pooling
- numeric features: scale + normalize
- embedding features: lookup
feature interactions: FM, DCN-v2, attention... 👉 ensemble: DHEN
representation transformation: experts control which features to use in what ways in which tasks

meta mrs @icml '24

improve a DLRM: better feature interactions, better ways to transform features, longer user sequences...

bag of tricks

feature extractions/encoding: sequence + ordinal features
feature interactions: embedding dimensional collapse
feature transformations: interest entanglement

feature encoding

sequence features (read more)

temporal interest module (tim)

motivation: items interacted more recently & semantically closer to target should be weighted more
formula: target attention (e.g., din, dien), but w/ temporal + semantic info

\boldsymbol{u_{\text{TIM}}} = \sum_{X_t \in S} \alpha(\boldsymbol{\tilde{e}_t}, \boldsymbol{\tilde{v}_a}) \cdot (\boldsymbol{\tilde{e}_t} \odot \boldsymbol{\tilde{v}_a})

\boldsymbol{\tilde{e}_t} = \boldsymbol{e_t} \oplus \boldsymbol{p_f}(X_t)

temporally encoded embedding

temporal encoding: time delta since action

target-aware attention

target-aware representation (feature interaction)

not all actions are equal

multiple numeral systems encoding (mnse)

motivation: preserve ordinality 👉 age 51 is closer to 50 than to 60
encoding: e.g., "51" mapped into multiple numeral systems
- binary: 6_1, 5_1, 4_0, 3_0, 2_1, 1_1
- ternary: 6_0, 5_0, 4_1, 3_2, 2_2, 1_0
embedding: project each bit into a learnable embedding 👉 in each system, sum pool bit embs 👉 for each number, concat pooled emb from each system for ranking
- "learning": e.g., 1_1 and 1_0 are closer than, say, 5_0 and 5_1

feature encoding

numeric features (ordinal)

concat before passing to ranking

dimensional collapose

phenomenon & root cause (read more)

discovery: increasing emb size (64 👉 192) doesn't always improve model performance (and can hurt sometimes...)
phenomenon: embs only span a small subspace of available dimension!!
- super wasteful: 99.9% tencent ads model params are dedicated to emb features
root cause: explicit feature interaction
- feature 1: gender 👉 low cardinality
- feature 2: item taxonomy 👉 high cardinality
- after interaction: only spans min(dim_feature1, dim_feature2)

diagnose: svd 👉 watch out for vanishing singular values

dimensional collapose

solution: multi-embedding paradigm

multi-embedding: each feature has multiple embedding tables
heterogeneous mixture-of-experts: use different embeddings in different interaction modules (i.e., GwPFM, FlatDNN, DCN V2)

different emb tables collapse differently, preserving more information than a single table...

biggest gmv gain for tencent: 3.9% rel lift

interest entanglement

beyond mmoe and ple

observation: users have different interests in different scenarios and tasks
- ads surfaces: moments (social feed), official accounts (subscription), tencent video (long video), channels (short video), tencent news...
- tasks: even just for "conversion" that sounds like one task, each conversion type is a task
problem: same user-item pair may be close in one scenario/task and far part in another... 👉 may result in negative transfer
solution: asymmetric multi-embedding (ame)
- each task group (e.g., different conversion types) has a fixed number of embs tables of diff sizes
- small tasks are routed to smaller embs via gating

sam flender @meta
coding monkey @linkedin
yuan's blogpost on sequential user modeling
new from tencent ads: scaling laws for online advertisement retrieval