ads recommendation in a collapsed and entangled world

Junwei Pan, Wei Xue, Ximei Wang, Haibin Yu, Xun Liu, Shijie Quan, Xueming Qiu, Dapeng Liu, Lei Xiao, Jie Jiang

tencent ads @kdd '24

yuan meng

 november 22, 2024

ml journal club @doordash

overview: deep learning recommender systems

  • feature extractions/encoding
    • sequence features: lookup + pooling
    • numeric features: scale + normalize
    • embedding features: lookup
  • feature interactions: FM, DCN-v2, attention... 👉 ensemble: DHEN
  • representation transformation: experts control which features to use in what ways in which tasks

meta mrs @icml '24

improve a DLRM: better feature interactions, better ways to transform features, longer user sequences...

bag of tricks

  • feature extractions/encoding: sequence + ordinal features
  • feature interactions: embedding dimensional collapse
  • feature transformations: interest entanglement

feature encoding

sequence features (read more)

 temporal interest module (tim)

  • motivation: items interacted more recently & semantically closer to target should be weighted more
  • formula: target attention (e.g., din, dien), but w/ temporal + semantic info

 

 

\boldsymbol{u_{\text{TIM}}} = \sum_{X_t \in S} \alpha(\boldsymbol{\tilde{e}_t}, \boldsymbol{\tilde{v}_a}) \cdot (\boldsymbol{\tilde{e}_t} \odot \boldsymbol{\tilde{v}_a})
\boldsymbol{\tilde{e}_t} = \boldsymbol{e_t} \oplus \boldsymbol{p_f}(X_t)

temporally encoded embedding

temporal encoding: time delta since action

target-aware attention

target-aware representation (feature interaction)

not all actions are equal

multiple numeral systems encoding (mnse)

  • motivation: preserve ordinality 👉 age 51 is closer to 50 than to 60
  • encoding: e.g., "51" mapped into multiple numeral systems
    • binary: 6_1, 5_1, 4_0, 3_0, 2_1, 1_1

    • ternary: 6_0, 5_0, 4_1, 3_2, 2_2, 1_0
  • embedding: project each bit into a learnable embedding 👉 in each system, sum pool bit embs 👉 for each number, concat pooled emb from each system for ranking
    • "learning": e.g., 1_1 and 1_0 are closer than, say, 5_0 and 5_1

feature encoding

numeric features (ordinal)

concat before passing to ranking

dimensional collapose

phenomenon & root cause (read more)

  • discovery: increasing emb size (64 👉 192) doesn't always improve model performance (and can hurt sometimes...)
  • phenomenon: embs only span a small subspace of available dimension!!
    • super wasteful: 99.9% tencent ads model params are dedicated to emb features
  • root cause: explicit feature interaction
    • feature 1: gender 👉 low cardinality
    • feature 2: item taxonomy 👉 high cardinality
    • after interaction: only spans min(dim_feature1, dim_feature2)

diagnose: svd 👉 watch out for vanishing singular values

dimensional collapose

solution: multi-embedding paradigm

  • multi-embedding: each feature has multiple embedding tables
  • heterogeneous mixture-of-experts: use different embeddings in different interaction modules (i.e., GwPFM, FlatDNN, DCN V2)

different emb tables collapse differently, preserving more information than a single table...

biggest gmv gain for tencent: 3.9% rel lift

interest entanglement

beyond mmoe and ple

  • observation: users have different interests in different scenarios and tasks
    • ads surfaces: moments (social feed), official accounts (subscription), tencent video (long video), channels (short video), tencent news...
    • tasks: even just for "conversion" that sounds like one task, each conversion type is a task
  • problem: same user-item pair may be close in one scenario/task and far part in another... 👉 may result in negative transfer
  • solution: asymmetric multi-embedding (ame)
    • each task group (e.g., different conversion types) has a fixed number of embs tables of diff sizes
    • small tasks are routed to smaller embs via gating

read more