Unsupervised Joke Generation from Big Data

 

\(Sa\check{s}a\ Petrovi\acute{c}\)

School of Informatics University of Edinburgh

sasa.petrovic@ed.ac.uk

\(David\ Matthews\)

School of Informatics University of Edinburgh

dave.matthews@ed.ac.uk

Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pages 228–232, Sofia, Bulgaria, August 4-9 2013.

c © 2013 Association for Computational Linguistics

Introduction

Generating Jokes

Typically considered to be a difficult NLP problem.

Particular Type of Joke

I like my men like I like my tea, hot and British.

​I like X like I like Y, Z

This is

The first fully unsupervised humor generation system.

Challenge

  • Content selection problem.
    • Need to be funny.

Approach

  • Large quantities of unlabeled data.
  • Unsupervised machine learning model.

Related Work

  • Humor recognition
  • Humor generation

Kiddon and Brun, 2011

Double entendre identification using SVM classifier.

Davidov et al., 2010

Sarcastic sentence identification

​Mihalcea and Strapparava, 2005

One-liner joke recognition

 

  • Both of them using a small amout of labeled data.
  • Using bootstrapping to gather more.

Humor Recognition

Humor Generation

Sjöbergh and Araki, 2008

Dirty joke telling robots

Labutov and Lipson, 2012

Two-liner jokes

Binsted and Ritchie, 1994

Punning riddles model

 

  • Again, all of them are supervised approachs.

Generating Jokes

Pattern

  • I like my \(X\) like I like my \(Y\), \(Z\)
    • ​\(X\), \(Y\) are nouns.
    • \(Z\) is an adjective.

Assumptions

  • A joke is more funnier if
    1. The attribute is often used to describe both nouns.
    2. The attribute is less common.
    3. The attribute is ambiguous.
    4. The two nouns are more dissimilar.

Assumption 1

  • The attribute is often used to describe both nouns.
  • \(f(x,z)\) is a function that measures the co-occurrence between \(x\) and \(z\).
    • Simply use frequency of co-occurrence.
    • Other functions could also be used, e.g., TF-IDF.

Equation 1

\(\phi(x,z)=p(x,z)=\frac{f(x,z)}{\Sigma_{x,z}f(x,z)}\)

Assumption 2

  • The attribute is less common.
  • \(f(z)\) is the number of times attribute \(z\) appears.

Equation 2

\(\phi(z)=1/f(z)\)

Assumption 3

  • Ambiguous attributes lead to funnier joke.
  • \(senses(z)\) is the number of different senses that \(z\) has.

Equation 3

\(\phi_2(z)=1/senses(z)\)

Assumption 4

  • Dissimilar nouns lead to funnier jokes.
  • \(sim\) is a similarity function that measures how similar nouns \(x\) and \(y\) are.

Equation 4

Equation 5

\(\phi(x,y)=1/sim(x,y)\)

\(sim(x,y)=\frac{\Sigma_zp(z|x)p(z|y)}{\sqrt{\Sigma_zp(z|x)^2*\Sigma_zp(z|y)^2}}\)

Joint Probability

  • Simply multiply all the factors and normalize.

Data

Google Ngram Data

  • Tag each word in the bigrams with the POS.
  • Discard bigram whose count is less than 1000.
  • There are 2 million pairs (n., adj.).

Wordnet

  • Only use it to obtain vary shallow infomation
    1. Number of \(senses(z)\).
    2. POS tags for bigram.

Experiments

Inference

  • Estimating the true probability is too expensive.
    • Fix \(x\).
    • Find \(P(Y, Z|X=x)\).
  • The only limitation of inference procedure.

Automatic Evaluation

  • LOL-Likelihood: LOcal Log-Likelihood
  • ROFL: Rank OF Likelihood

Human Evaluation

  • Use baseline, Model 1 and Model 2.
  • 32 jokes from Twitter.
  • 5 native English speakers rater, the rater were asked to score each joker on a 3-point Likert scale​​:
    1. Funny
    2. Somewhat funny
    3. Not funny

Finally

  • The funny jokes generated by our system are not simply repeats of the human jokes, but entirely new ones that were not be able to find anywhere online.
I like my relationships like I like my source, open.
I like my coffee like I like my war, cold.
I like my boys like I like my sectors, bad.

Table 3: Example jokes generated by Model 2.

Conclusion

Unsupervised Joke Generation from Big Data

By Penut Chen (PenutChen)