Gistr

A web experiment for cultural evolution models

MINT, MPI für Menschheitsgeschichte

Sébastien Lerique / 29th May 2018

IN VIVO

Claidière et al. (2014)

Moussaïd et al. (2015)

IN VITRO

Adamic et al. (2016)

Online data

Transmission chains

Short-term cultural evolution

Mesoudi and Whiten (2004)

Leskovec et al. (2009)

Empirical bind

Realistic content

Control over data-generation

Computational analysis

Already coded

Do-it-by-hand

Simple setting

Danescu-Niculescu-Mizil et al. (2012)

Moussaïd et al. (2015)

Lauf et al. (2013)

Cornish et al. (2013)

Claidière et al. (2014)

Web experiments

Sequence alignments

Lerique & Roth (2017)

Requirements fulfilled

Control over experimental setting

Fast iterations

Scale

Exp. 1

MemeTracker, WikiSource,

12 Angry Men, Tales,

News stories

Exp. 2

Memorable/non-memorable quote pairs

(Danescu-Niculescu-Mizil et al., 2012)

Exp. 3

Nouvelles en trois lignes

(Fénéon, 1906)

Web-based experiments

Experiment setup

reading and writing time \(\propto\) number of words

Transformation model

At Dover, the finale of the bailiffs' convention. Their duties, said a speaker, are "delicate, dangerous, and insufficiently compensated."

depth in branch

At Dover, the finale of the bailiffs convention,their duty said a speaker are delicate, dangerous and detailed

At Dover, at a Bailiffs convention. a speaker said that their duty was to patience, and determination

In Dover, at a Bailiffs convention, the speaker said that their duty was to patience.

In Dover, at a Bailiffs Convention, the speak said their duty was to patience

At Dover, the finale of the bailiffs' convention. Their duties, said a speaker, are "delicate, dangerous, and insufficiently compensated."

At Dover, the finale of the bailiffs convention,their duty said a speaker are delicate, dangerous and detailed

Sequence alignments

Needleman and Wunsch (1970)

AGAACT-
 | ||
-G-AC-G

AGAACT

GACG

Finding her son, Alvin, 69, hanged, Mrs Hunt, of Brighton, was so depressed she could not cut him down
Finding her son Arthur 69 hanged Mrs Brown from Brighton was so upset she could not cut him down
Finding her son Alvin  69 hanged Mrs Hunt of -     -    Brighton, was so depressed she could not cut him down
Finding her son Arthur 69 hanged Mrs -    -  Brown from Brighton was so upset she could not cut him down

Apply to utterances using NLP

At Dover, the finale of the bailiffs convention, their duty said a speaker are delicate, dangerous and detailed
At Dover, at a Bailiffs convention. a speaker said that their duty was to patience, and determination
At Dover the finale of the -  - bailiffs convention - -       -    -    their duty
At Dover -   -      -  -   at a Bailiffs convention a speaker said that their duty 


said a speaker are delicate dangerous -   -  -        and detailed -
-    - -       -   -        -         was to patience and -        determination
At Dover the finale of the -  - bailiffs convention |-Exchange-1------| their duty
At Dover -   -      -  -   at a Bailiffs convention a speaker said that their duty 


said a speaker are delicate dangerous -   -  -        and detailed -
|-Exchange-1------------------------| was to patience and -        determination
said a speaker are delicate dangerous |-E2----|
|E2| a speaker -   -        -         said that
said -
said that

\(\hookrightarrow E_1\)

\(\hookrightarrow E_2\)

Extend to build recursive deep alignments

What does this afford us?

B = \frac{\sigma_{intervals} - \mu_{intervals}}{\sigma_{intervals} + \mu_{intervals}}
B=σintervalsμintervalsσintervals+μintervalsB = \frac{\sigma_{intervals} - \mu_{intervals}}{\sigma_{intervals} + \mu_{intervals}}
.22 \leq B \leq .33
.22B.33.22 \leq B \leq .33

Detailed behaviours

Frequency

Frequency

Frequency

|chunk|

|chunk|

Deletion

Insertion

Replacement

Position in \(u\)

\(|u|_w\)

Number of operations vs. utterance length

Susceptibility vs. position in utterance

Deletions tend to gate other operations

Insertions relate to preceding deletions

Stubbersfield et al. (2015)

Bebbington et al. (2017)

Links the low-level with contrasted outcomes

Lexical evolution (1)

\sigma_g^- = \frac{s_g^-}{s_g^0}
σg=sgsg0\sigma_g^- = \frac{s_g^-}{s_g^0}
\sigma_g^+ = \frac{s_g^+}{s_g^0}
σg+=sg+sg0\sigma_g^+ = \frac{s_g^+}{s_g^0}

Step-wise

Susceptibility

Feature variation

Lexical evolution (2)

Along the branches

Realistic content

Control over data-generation

Computational analysis

Already coded

Do-it-by-hand

Simple setting

Empirical (un)bind

Quantitative analysis of changes

Structural changes from exchanges

Relating insertion and deletion chunks

Inner structure of transformations

Sequence alignments of semantic parses

Further work

In vivo applications to more complete data sets (social networks)

Sentence processing \(\leftrightarrow\) Higher level evolution

Feedback loops: utterance distribution \(\leftrightarrow\) detailed transformations

Long-lived chains with recurring changes

Semantic parsing and NLP methods on the inner structure

Connect to the constitution of meaning in interaction and context

Openings

Semantics

Thank you

Supervision

Organising

Jean-Pierre Nadal & Camille Roth

Olivier Morin

Questions

You!

Challenges with meaning

Can you think of anything else, Barbara, they might have told me about that party?

I've spoken to the other children who were there that day.

S

B

Abuser

The Devil's Advocate (1997)

?

Strong pragmatics (Scott-Phillips, 2017)

Access to context

Theory of the constitution of meaning

Challenges

Live experiment

Data quality

#participants
#root utterances
tree size
Duration
Spam rate
Usable reformulations
53 49 2 x 70
54 50 25/batch
48 49 70
64min 43min 37min/batch
22.4% + 3.5% 0.8% + 0.6% 1% + 0.1%
1980 2411 3506
Exp. 1 Exp. 2 Exp. 3
  • First large-scale launch
    • Bugs and customer service
    • Mistaken UI affordances
  • Extensive rewrite
    • Automated tests
    • Pilots evaluating the UI
    • Pilots sampling root utterances

– “There is no hope for peace, it is a lost cause”

→ “There is no lose hope that rara ra to op”

– “My Government's overriding priority is to ensure the stability of the British economy”

→ “My governments overall liability is to sort out the... not sure.”

Alignment optimisation

\(\theta_{open}\)

\(\theta_{extend}\)

\(\theta_{mismatch}\)

\(\theta_{exchange}\) by hand

All transformations

Hand-coded training set size?

Train the \(\theta_*\) on hand-coded alignments

Simulate the training process: imagine we know the optimal \(\theta\)

1. Sample \(\theta^0 \in [-1, 0]^3\) to generate artificial alignments for all transformations

2. From those, sample \(n\) training alignments

3. Brute-force \(\hat{\theta}_1, ..., \hat{\theta}_m\) estimators of \(\theta_0\)

4. Evaluate the number of errors per transformation on the test set

Test set

10x

10x

\(\Longrightarrow\) 100-200 hand-coded alignments yield \(\leq\) 1 error/transformation

Gap open cost \(\rightarrow \theta_{open}\)

Gap extend cost \(\rightarrow \theta_{extend}\)

Item match-mismatch

similarity(w, w') = \begin{cases} S_C \left( w, w' \right) & if\ we\ have\ word\ vectors\ for\ both\ w\ and\ w' \\ \delta_{lemma(w), lemma(w')} & otherwise \end{cases}
similarity(w,w)={SC(w,w)if we have word vectors for both w and wδlemma(w),lemma(w)otherwisesimilarity(w, w') = \begin{cases} S_C \left( w, w' \right) & if\ we\ have\ word\ vectors\ for\ both\ w\ and\ w' \\ \delta_{lemma(w), lemma(w')} & otherwise \end{cases}
score(w, w') = similarity(w, w') + \theta_{mismatch}
score(w,w)=similarity(w,w)+θmismatchscore(w, w') = similarity(w, w') + \theta_{mismatch}

Example data

Immediately after I become president I will confront this economic challenge head-on by taking all necessary steps

immediately after I become a president I will confront this economic challenge

Immediately after I become president, I will tackle this economic challenge head-on by taking all the necessary steps

This crisis did not develop overnight and it will not be solved overnight

the crisis did not developed overnight, and it will be not solved overnight

original

This, crisis, did, not, develop, overnight, and, it, will, not, be, solved, overnight

this, crisis, did, not, develop, overnight, and, it, will, not, be, solved, overnight

this, crisis, did, not, develop, overnight, and, it, will, not, be, solved, overnight

crisi, develop, overnight, solv, overnight

tokenize

lowercase & length > 2

stopwords

stem

The crisis didn't happen today won't be solved by midnight.

crisi, happen, today, solv, midnight

d = 0,6

Utterance-to-utterance distance

Aggregate trends

Size reduction

Transmissibility

Variability

Lexical evolution - POS

\sigma_g^- = \frac{s_g^-}{s_g^0}
σg=sgsg0\sigma_g^- = \frac{s_g^-}{s_g^0}
\sigma_g^+ = \frac{s_g^+}{s_g^0}
σg+=sg+sg0\sigma_g^+ = \frac{s_g^+}{s_g^0}

Step-wise

Susceptibility

Gistr

By Sébastien Lerique