Suman Banerjee Mitesh M. Khapra
Department of Computer Science and Engineering,
Indian Institute of Technology Madras
Outline
Outline
Dialogue Systems
Siri
Cortana
Bixby
Google assistant
Alexa
Apple Homepod
Amazon Echo
Google Home
Two Paradigms
Challenges
Chit-Chat
Goal-Oriented
Two Paradigms
Challenges
Chit-Chat
Goal-Oriented
Outline
Modular Architecture
Language Understanding
Dialogue State Tracking
Policy Optimizer
Language Generation
User utterance
System response
Semantic Frame
System Action
Dialogue State
I need a cheap chinese restaurant in the north of town.
request_rest(cuisine=chinese, price=cheap, area=north)
Knowledge Base
request_people( )
Sure, for how many people?
Dialogue Manager
Probabilistic methods in spoken-dialogue systems, Steve J. Young, Philosophical Transactions: Mathematical, Physical and Engineering Sciences, 2000
Drawbacks
Language Understanding
Dialogue State Tracking
Policy Optimizer
Language Generation
User utterance
System response
Semantic Frame
System Action
Dialogue State
I need a cheap chinese restaurant in the north of town.
request_rest(cuisine=chinese, price=cheap, area=north)
Knowledge Base
request_people( )
Sure, for how many people?
Dialogue Manager
Outline
End-to-End Architecture
User utterance
I need a cheap chinese restaurant in the north of town.
System response
Sure, for how many people?
Knowledge Base
End-to-End Dialogue System
Outline
Recurrent Neural Networks
\( x_1\)
\( x_2\)
\( x_3\)
\( x_4\)
\( x_5\)
\( U \)
\( U \)
\( U \)
\( U \)
\( U \)
\( W \)
\( W \)
\( W \)
\( W \)
\( V \)
\( V \)
\( V \)
\( V \)
\( V \)
\( s_1 \)
\( s_2 \)
\( s_3 \)
\( s_4 \)
\( s_5 \)
\( y_1\)
\( y_2\)
\( y_3\)
\( y_4\)
\( y_5\)
\( s_0\)
\( W \)
Find a cheap Chinese restaurant
\( s_i = RNN (s_{i-1},x_i)\)
Sequence-to-Sequence Models
Sequence-to-Sequence Learning with Neural Networks, Sutskever et.al., NeurIPS, 2014
Encoder
Decoder
Sequence-to-Sequence Models
Attention:
Neural Machine Translation by Jointly Learning to Align and Translate, Bahdanau et. al., ICLR, 2015
\( \alpha_{it}\)
\( \mathbf{c}_{t}\)
\( \mathbf{h}_{i}\)
\( \mathbf{d}_{t}\)
Encoder
Decoder
Hierarchical Recurrent Encoder Decoder
Building End-to-End Dialogue Systems Using Generative Hierarchical Neural Network Models, Serban et. al., AAAI, 2016
Memory Networks
\( \mathbf{q}^1\)
\( \mathbf{q}^b\)
\( \mathbf{q}^{b+1}\)
\( \mathbf{q}^{b+1}\)
\( \mathbf{o}^b\)
\( \mathbf{p}^b\)
Candidates
\( \mathbf{q}^b\)
\( \mathbf{q}^{B+1}\)
\( W\)
\( \mathbf{\hat{z}}\)
\( BOW(y_i)\)
End-to-End Memory Networks, Sukhbaatar et. al., NeurIPS 2015
User \((u_1)\): Hello!
System \((s_1)\): How can I help you today?
...
System \((s_{t-1})\): How about Fancy_Pub?
\(\mathbf{m}_2 = \)
\(BOW(s_1)\)
\(A \cdot\)
User \((u_t)\): I don't like it
\(\mathbf{q} = \)
\(BOW(u_t)\)
\(C \cdot\)
\(y_1\): Let me find another one
\(y_2\): How about The_Place
...
\(y_{c}\): Sorry, there are no other pubs
Outline
Goal-oriented Dialogue phases
System: Hello, what can I help you with today?
User: Hi!
User: I'd like to book a table for six people in an expensive restaurant with British food.
System: where should it be ?
User: Somewhere in the north of the town.
System: ok I am on it!
System: API_CALL (British, north, six, expensive)
System: How about The_Place?
User: No I don't like it.
System: Ok, then, what do you think about Fancy_Pub?
User: Yes! lets do it.
System: Great! I'll do the reservation.
User: What is the address?
System: Here it is: Fancy_Pub_Address
User: Thank You
System: You're welcome
The_Place R_cuisine British
The_Place R_location north
The_Place R_address The_Place_Address
The_Place R_price expensive
The_Place R_rating 10
Fancy_pub R_cuisine British
Fancy_pub R_location north
Fancy_pub R_address Fancy_pub_Address
Fancy_pub R_price expensive
Fancy_pub R_rating 8
Goal-oriented Dialogue phases
Pre-KB
System: Hello, what can I help you with today?
User: Hi!
User: I'd like to book a table for six people in an expensive restaurant with British food.
System: where should it be ?
User: Somewhere in the north of the town.
System: ok I am on it!
System: API_CALL (British, north, six, expensive)
System: How about The_Place?
User: No I don't like it.
System: Ok, then, what do you think about Fancy_Pub?
User: Yes! lets do it.
System: Great! I'll do the reservation.
User: What is the address?
System: Here it is: Fancy_Pub_Address
User: Thank You
System: You're welcome
KB
Post-KB
The_Place R_cuisine British
The_Place R_location north
The_Place R_address The_Place_Address
The_Place R_price expensive
The_Place R_rating 10
Fancy_pub R_cuisine British
Fancy_pub R_location north
Fancy_pub R_address Fancy_pub_Address
Fancy_pub R_price expensive
Fancy_pub R_rating 8
Structural Information
Dependency Parse of sentences
Knowledge Graph
Code Mixing
Speaker 1: Hi, can you help me with booking a table at a restaurant?
Speaker 2: Sure, would you like something in cheap, moderate or expensive?
Speaker 1: Hi, kya tum ek restaurant mein table book karne mein meri help karoge?
Speaker 2: Sure, aap ko kaunsi price range mein chahiye, cheap, moderate ya expensive?
Speaker 1: Hi, tumi ki ekta restaurant ey table book korte amar help korbe?
Speaker 2: Sure, aapni kon price range ey chaan, cheap, moderate na expensive?
Problem
Outline
Single Attention Distribution
Sequential Attention
Pre-KB
Post-KB
KB
Sequential Attention
Post-KB RNN
Post-KB
Query RNN
\(\boldsymbol \alpha_t\)
\(\Big\{\)
\(\mathbf{h}_{post}\)
Sequential Attention
\(\boldsymbol \alpha_t\)
\(\Big\{\)
\(\mathbf{h}_{post}\)
KB
KB Memory Network
Sequential Attention
\(\boldsymbol \alpha_t\)
\(\Big\{\)
\(\mathbf{h}_{post}\)
Pre-KB
\(\boldsymbol \beta_t\)
\(\Big\}\)
\(\mathbf{h}_{pre}\)
End-to-End Network
Outline
Graph Convolutional Network (GCN)
Semi-supervised classification with graph convolutional networks. Kipf and Welling, ICLR, 2017.
Problem
Syntactic GCNs with RNN
RNN-Encoder
Syntactic GCNs with RNN
GCN
RNN - GCN
GCN with Sequential Attention
Query Attention
\[ \alpha_{jt} = f_1(\mathbf{c}^f_j, \mathbf{d}_{t-1}) \]
\[ \mathbf{h}^Q_t =\sum_{j'=1}^{|Q|} \alpha_{j't}\mathbf{c}_{j'}^f \]
History Attention
\[\beta_{jt} = f_2(\mathbf{a}^f_j, \mathbf{d}_{t-1}, \mathbf{h}^Q_t)\]
\[ \mathbf{h}^H_t = \sum_{j'=1}^{|H|} \beta_{j't}\mathbf{a}_{j'}^f \]
KB Attention
\[ \gamma_{jt} = f_3(\mathbf{r}^f_j,\mathbf{d}_{t-1}, \mathbf{h}^Q_t,\mathbf{h}^H_t)\]
\[ \mathbf{h}^K_t = \sum_{j'=1}^m \gamma_{j't}\mathbf{r}_{j'}^f \]
GCNs for code-mixed utterances
\(^1\)Word association norms mutual information, and lexicography, Church and Hanks, Computational Linguistics, 1990
y: | The_Place | serves | British | food | and | the | prices | are | expensive |
\( labelc \)
7 | 9 | 1 | 9 | 9 | 9 | 9 | 9 | 4 |
Chinese | British | Italian | cheap | expensive | moderate | Fancy_Pub | The_Place | Prezzo | # |
\( Memory = \)
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
Copy Mechanism
Copy Mechanism
Memory Network
\( P^k = softmax(r^fC^kq^k)\)
\( q^{k+1} = q^k + \sum_{j=0}^{m}P^k_jr^f_j\)
\( q^{1} = d_{t}\)
'#'
\( P_{vocab} = softmax(V'd_{t} + b')\)
Mem2seq: Effectively incorporating knowledge bases into end-to-end task oriented dialog systems, Madotto et. al., ACL, 2018
Copy Mechanism
Memory
Generated
Response
Heat map for \( P_{copy}\) across all decoder time steps
Outline
DSTC2 dataset
\(^1\)The second dialog state tracking challenge, Henderson et. al., SIGDIAL, 2014
Utterance | Annotations |
---|---|
Where is this restaurant located? | request{area} |
Prezzo is located in the north part of town | inform{area= north} |
Find me a cheap Chinese restarurant | search(cuisine=Chinese,price=cheap) |
Do you have anything else? | search(by_alternative) |
Modified DSTC2
System: Hello, what can I help you with today?
User: Hi!
User: I'd like to book a table for six people in an expensive restaurant with British food.
System: where should it be ?
User: Somewhere in the north of the town.
System: ok I am on it!
System: API_CALL (British, north, six, expensive)
The_Place R_cuisine British
The_Place R_location north
The_Place R_address The_Place_Address
The_Place R_price expensive
The_Place R_rating 10
Fancy_pub R_cuisine British
Fancy_pub R_location north
Fancy_pub R_address Fancy_pub_Address
Fancy_pub R_price expensive
Fancy_pub R_rating 8
System: How about The_Place?
User: No I don't like it.
System: Ok, then, what do you think about Fancy_Pub?
User: Yes! lets do it.
System: Great! I'll do the reservation.
User: What is the address?
System: Here it is: Fancy_Pub_Address
User: Thank You
System: You're welcome
Learning end-to-end goal-oriented dialog, Bordes et. al. , ICLR, 2017.
Outline
Code-mixed Data Collection
Extract Unique
Utterances
Replace entities
with placeholders
Unique
utterances
Crowdsource
code-mixing
utterance templates
Replace placeholders
with entities
code-mixed templates
Replace utterances
back into dialogue
code-mixed
utterances
code-mixed dialogue data
English
dialogue data
Sorry there is no Chinese restaurant in the west part of town
Sorry there is no Italian restaurant in the north part of town
Sorry there is no <CUISINE> restaurant in the <AREA> part of town
Data Evaluation
Quantification of Code-mixing
Comparing the level of code-switching in corpora, Björn Gambäck and Amitava Das, LREC, 2016
Quantification of Code-mixing
Comparing the level of code-switching in corpora, Björn Gambäck and Amitava Das, LREC, 2016
Outline
Baseline Results
Results on En-DSTC2
RNN + CROSS -GCN-SeA
Results on code-mixed data
Effect of using more GCN hops
PPMI vs Raw Frequencies
Dependency or co-occurrence structure really needed ?
Dependency edges
Random edges
Ablations
RNN
GCN
RNN-GCN
Encoder :
Attention :
Bahdanau
Sequential
GCN
RNN-GCN
Sequential
RNN
GCN
RNN-GCN
Ablations
Ablations
GCNs do not outperform RNNs independently:
performance of GCN-Bahdanau attention < RNN-Bahdanau attention
Our Sequential attention outperforms Bahdanau attention:
GCN-Bahdanau attention < GCN-Sequential attention
RNN-Bahdanau attention < RNN-Sequential attention (BLEU & ROUGE)
RNN+GCN-Bahdanau attention < RNN+GCN-Sequential attention
Combining GCNs with RNNs helps:
RNN-Sequential attention < RNN+GCN-Sequential attention
Best results are always obtained by the final model which combines RNN, GCN and Sequential attention
Conclusion
A single attention distribution overburdens the attention mechanism
Separated the history into Pre-KB, KB and Post-KB parts and attended sequentially over them
Showed that structure-aware representations are useful in goal-oriented dialogue
Used GCNs to infuse structural information of dependency graphs into the learned representations
Quantified the amount of code-mixing present in the dataset
When dependency parsers are not available, we used word co-occurrence frequencies and PPMI values to extract a contextual graph
Obtained state-of-the-art performance on the modified DSTC2 dataset and its code-mixed versions
Future Work
Extend the model to multidomain goal-oriented dialogue (restaurants, hotels, taxi)
Conditional code-mixed response generation
Better copy mechanism
Use the whole Knowledge-Graph instead of dialogue specific KB triples
Use semantic graphs along with dependency parse trees
Publications
Questions?
Language Understanding
Language Understanding
User utterance
Predicted
Intent
I need a cheap chinese restaurant in the north of town.
Classifier
Semantic Frame
request_rest(cuisine=chinese, price=cheap, area=north)
Intent Classification
Intent Classification
Language Understanding
User utterance
Predicted
Tags
I need a cheap chinese restaurant in the north of town.
Slot Filling
I | <null> |
need | <null> |
a | <null> |
cheap | <price> |
chinese | <cuisine> |
restaurant | <null> |
in | <null> |
the | <null> |
north | <area> |
of | <null> |
town | <null> |
Dialogue Management
User: Book a table at Prezzo for 5.
System: How many people?
User: For 3.
#People
Time
Language Generation
Language Generation
inform(rest=Prezzo, cuisine=italian)
System action
System response
Prezzo is a nice restaurant which serves italian.
Future Plans