LDA Topic Model

Outline

  • Latent Dirichlet Allocation
  • LDA Topic Model
  • Implementation -- Gibbs Sampling
  • Limitation

Latent(潛在的) Dirichlet Allocation

Dirichlet Allocation

Distribution of Distribution

LDA Topic Model

LDA Topic Model

  • The document doesn't 100% belong to one topic.

LDA Topic Model

  • Every topic has its own bag of words.

What to Solve

  • documents → observed
  • topic structure → hidden structure
    • the topics
    • per-document topic distributions
    • the per-document per-word topic assignments

What to Solve

  • observed documents -> hidden topic structure
  • maximize the probability -> too complex
  • focus the per-document per-word topic assignments

Gibbs Sampling

focus the per-document per-word topic assignments

2 Properties

  • Articles are as monochromatic as possible.
  • Words are as monochromatic as possible.

Coloring Problem

Gibbs Sampling

Topic id -> Topic

Limitation

Limitation

  • # of the topic is fixed
  • “bag of words” assumption
    • the order of the words in the document does not matter
  • require human to give topics according to words

Recap

Reference

Made with Slides.com