Suman Banerjee
Department of Computer Science and Engineering,
Indian Institute of Technology Madras
Outline
Outline
Dialogue Systems
Siri
Cortana
Bixby
Google assistant
Alexa
Apple Homepod
Amazon Echo
Google Home
Two Paradigms
Challenges
Chit-Chat
Goal-Oriented
Two Paradigms
Challenges
Chit-Chat
Goal-Oriented
Outline
Modular Architecture
Language Understanding
Dialogue State Tracking
Policy Optimizer
Language Generation
User utterance
System response
Semantic Frame
System Action
Dialogue State
I need a cheap chinese restaurant in the north of town.
request_rest(cuisine=chinese, price=cheap, area=north)
Knowledge Base
request_people( )
Sure, for how many people?
Dialogue Manager
Probabilistic methods in spoken-dialogue systems, Steve J. Young, Philosophical Transactions: Mathematical, Physical and Engineering Sciences, 2000
Drawbacks
Language Understanding
Dialogue State Tracking
Policy Optimizer
Language Generation
User utterance
System response
Semantic Frame
System Action
Dialogue State
I need a cheap chinese restaurant in the north of town.
request_rest(cuisine=chinese, price=cheap, area=north)
Knowledge Base
request_people( )
Sure, for how many people?
Dialogue Manager
Outline
End-to-End Architecture
User utterance
I need a cheap chinese restaurant in the north of town.
System response
Sure, for how many people?
Knowledge Base
End-to-End Dialogue System
Outline
Sequence-to-Sequence Models
Sequence-to-Sequence Learning with Neural Networks, Sutskever et.al., NeurIPS, 2014
Encoder
Decoder
Sequence-to-Sequence Models
Attention:
Neural Machine Translation by Jointly Learning to Align and Translate, Bahdanau et. al., ICLR, 2015
\( \alpha_{it}\)
\( \mathbf{c}_{t}\)
\( \mathbf{h}_{i}\)
\( \mathbf{d}_{t}\)
Encoder
Decoder
Hierarchical Recurrent Encoder Decoder
Building End-to-End Dialogue Systems Using Generative Hierarchical Neural Network Models, Serban et. al., AAAI, 2016
Memory Networks
\( \mathbf{q}^1\)
\( \mathbf{q}^b\)
\( \mathbf{q}^{b+1}\)
\( \mathbf{q}^{b+1}\)
\( \mathbf{o}^b\)
\( \mathbf{p}^b\)
Candidates
\( \mathbf{q}^b\)
\( \mathbf{q}^{B+1}\)
\( W\)
\( \mathbf{\hat{z}}\)
\( BOW(y_i)\)
End-to-End Memory Networks, Sukhbaatar et. al., NeurIPS 2015
User \((u_1)\): Hello!
System \((s_1)\): How can I help you today?
...
System \((s_{t-1})\): How about Fancy_Pub?
\(\mathbf{m}_2 = \)
\(BOW(s_1)\)
\(A \cdot\)
User \((u_t)\): I don't like it
\(\mathbf{q} = \)
\(BOW(u_t)\)
\(C \cdot\)
\(y_1\): Let me find another one
\(y_2\): How about The_Place
...
\(y_{c}\): Sorry, there are no other pubs
Outline
Goal-oriented Dialogue phases
System: Hello, what can I help you with today?
User: Hi!
User: I'd like to book a table for six people in an expensive restaurant with British food.
System: where should it be ?
User: Somewhere in the north of the town.
System: ok I am on it!
System: API_CALL (British, north, six, expensive)
System: How about The_Place?
User: No I don't like it.
System: Ok, then, what do you think about Fancy_Pub?
User: Yes! lets do it.
System: Great! I'll do the reservation.
User: What is the address?
System: Here it is: Fancy_Pub_Address
User: Thank You
System: You're welcome
The_Place R_cuisine British
The_Place R_location north
The_Place R_address The_Place_Address
The_Place R_price expensive
The_Place R_rating 10
Fancy_pub R_cuisine British
Fancy_pub R_location north
Fancy_pub R_address Fancy_pub_Address
Fancy_pub R_price expensive
Fancy_pub R_rating 8
Goal-oriented Dialogue phases
Pre-KB
System: Hello, what can I help you with today?
User: Hi!
User: I'd like to book a table for six people in an expensive restaurant with British food.
System: where should it be ?
User: Somewhere in the north of the town.
System: ok I am on it!
System: API_CALL (British, north, six, expensive)
System: How about The_Place?
User: No I don't like it.
System: Ok, then, what do you think about Fancy_Pub?
User: Yes! lets do it.
System: Great! I'll do the reservation.
User: What is the address?
System: Here it is: Fancy_Pub_Address
User: Thank You
System: You're welcome
KB
Post-KB
The_Place R_cuisine British
The_Place R_location north
The_Place R_address The_Place_Address
The_Place R_price expensive
The_Place R_rating 10
Fancy_pub R_cuisine British
Fancy_pub R_location north
Fancy_pub R_address Fancy_pub_Address
Fancy_pub R_price expensive
Fancy_pub R_rating 8
Structural Information
Dependency Parse of sentences
Knowledge Graph
Code Mixing
Speaker 1: Hi, can you help me with booking a table at a restaurant?
Speaker 2: Sure, would you like something in cheap, moderate or expensive?
Speaker 1: Hi, kya tum ek restaurant mein table book karne mein meri help karoge?
Speaker 2: Sure, aap ko kaunsi price range mein chahiye, cheap, moderate ya expensive?
Speaker 1: Hi, tumi ki ekta restaurant ey table book korte amar help korbe?
Speaker 2: Sure, aapni kon price range ey chaan, cheap, moderate na expensive?
Problem
Outline
Single Attention Distribution
Sequential Attention
Pre-KB
Post-KB
KB
Sequential Attention
Post-KB RNN
Post-KB
Query RNN
\(\boldsymbol \alpha_t\)
\(\Big\{\)
\(\mathbf{h}_{post}\)
Sequential Attention
\(\boldsymbol \alpha_t\)
\(\Big\{\)
\(\mathbf{h}_{post}\)
KB
KB Memory Network
Sequential Attention
\(\boldsymbol \alpha_t\)
\(\Big\{\)
\(\mathbf{h}_{post}\)
Pre-KB
\(\boldsymbol \beta_t\)
\(\Big\}\)
\(\mathbf{h}_{pre}\)
End-to-End Network
Outline
Graph Convolutional Network (GCN)
Semi-supervised classification with graph convolutional networks. Kipf and Welling, ICLR, 2017.
Problem
Syntactic GCNs with RNN
RNN-Encoder
Syntactic GCNs with RNN
GCN
RNN - GCN
GCN with Sequential Attention
Query Attention
\[ \alpha_{jt} = f_1(\mathbf{c}^f_j, \mathbf{d}_{t-1}) \]
\[ \mathbf{h}^Q_t =\sum_{j'=1}^{|Q|} \alpha_{j't}\mathbf{c}_{j'}^f \]
History Attention
\[\beta_{jt} = f_2(\mathbf{a}^f_j, \mathbf{d}_{t-1}, \mathbf{h}^Q_t)\]
\[ \mathbf{h}^H_t = \sum_{j'=1}^{|H|} \beta_{j't}\mathbf{a}_{j'}^f \]
KB Attention
\[ \gamma_{jt} = f_3(\mathbf{r}^f_j,\mathbf{d}_{t-1}, \mathbf{h}^Q_t,\mathbf{h}^H_t)\]
\[ \mathbf{h}^K_t = \sum_{j'=1}^m \gamma_{j't}\mathbf{r}_{j'}^f \]
GCNs for code-mixed utterances
\(^1\)Word association norms mutual information, and lexicography, Church and Hanks, Computational Linguistics, 1990
y: | The_Place | serves | British | food | and | the | prices | are | expensive |
\( labelc \)
7 | 9 | 1 | 9 | 9 | 9 | 9 | 9 | 4 |
Chinese | British | Italian | cheap | expensive | moderate | Fancy_Pub | The_Place | Prezzo | # |
\( Memory = \)
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
Copy Mechanism
Copy Mechanism
Memory Network
\( P^k = softmax(r^fC^kq^k)\)
\( q^{k+1} = q^k + \sum_{j=0}^{m}P^k_jr^f_j\)
\( q^{1} = d_{t}\)
'#'
\( P_{vocab} = softmax(V'd_{t} + b')\)
Mem2seq: Effectively incorporating knowledge bases into end-to-end task oriented dialog systems, Madotto et. al., ACL, 2018
Copy Mechanism
Memory
Generated
Response
Heat map for \( P_{copy}\) across all decoder time steps
Outline
Modified DSTC2
System: Hello, what can I help you with today?
User: Hi!
User: I'd like to book a table for six people in an expensive restaurant with British food.
System: where should it be ?
User: Somewhere in the north of the town.
System: ok I am on it!
System: API_CALL (British, north, six, expensive)
The_Place R_cuisine British
The_Place R_location north
The_Place R_address The_Place_Address
The_Place R_price expensive
The_Place R_rating 10
Fancy_pub R_cuisine British
Fancy_pub R_location north
Fancy_pub R_address Fancy_pub_Address
Fancy_pub R_price expensive
Fancy_pub R_rating 8
System: How about The_Place?
User: No I don't like it.
System: Ok, then, what do you think about Fancy_Pub?
User: Yes! lets do it.
System: Great! I'll do the reservation.
User: What is the address?
System: Here it is: Fancy_Pub_Address
User: Thank You
System: You're welcome
Learning end-to-end goal-oriented dialog, Bordes et. al. , ICLR, 2017.
Code-mixed Data Collection
Extract Unique
Utterances
Replace entities
with placeholders
Unique
utterances
Crowdsource
code-mixing
utterance templates
Replace placeholders
with entities
code-mixed templates
Replace utterances
back into dialogue
code-mixed
utterances
code-mixed dialogue data
English
dialogue data
Sorry there is no Chinese restaurant in the west part of town
Sorry there is no Italian restaurant in the north part of town
Sorry there is no <CUISINE> restaurant in the <AREA> part of town
Quantification of Code-mixing
Comparing the level of code-switching in corpora, Björn Gambäck and Amitava Das, LREC, 2016
Quantification of Code-mixing
Comparing the level of code-switching in corpora, Björn Gambäck and Amitava Das, LREC, 2016
Outline
Self-chat for Movie Dialogue
PLOT
The British navy requisitions civilian vessels that can get close to the beach. In Weymouth, Mr. Dawson and his son Peter set out on his boat Moonstone rather than let the navy take it. Impulsively, their teenage friend George joins them. At sea, they rescue a shell-shocked officer from a wrecked ship.
REVIEW
Dunkirk is edge of your seat filmmaking.They're all great but Dunkirk is not about any one solider. Also 'Dunkirk' is another brilliant collaboration between Nolan & Hans Zimmer. The way he mixes in a ticking clock with score is nail biting. Dunkirk relies on very little dialogue
COMMENTS
This is a very important movie, because it doesn't glamorize or glorify war.
Just awesome! Simply awesome!
Hans Zimmer did really great with the score
The movie was brilliant
DIALOGUE
Speaker1 (N): What do you think about the movie?
Speaker2 (C): I think the movie was brilliant.
Speaker1 (N): Agreed! One of the finest in this genre.
Speaker2 (C): I believe the best part about the movie is that it doesn't glamorize or glorify war.
Speaker1 (N): Totally! Oh by the way do you remember the name of the ship headed by Mr.Dawson ?
Speaker2 (P): Yes. It was Moonstone
Speaker1 (N): Right. I am always impressed by the Nolan - Hans Zimmer collaboration.
Speaker2 (R): The way he mixes in a ticking clock with score is nail biting.
Speaker1 (N): That's an interesting way to put it. Would recommend any other movies ?
Speaker2 (F): I think you will enjoy Saving Private Ryan
Box Office | $520,102,431 |
Similar Movies |
Saving Private Ryan Interstellar |
FACT TABLE
Self-chat for Products
DESCRIPTION
. . . Dual Selfie Camera. One for Selfie, One for Groupfie The OPPO F3 Plus sports dual front cameras for best selfies - A 16 MP camera and a 8 MP 120 wide angle camera. Meanwhile, beautify 4.0 gives even more options to capture head-turning selfies.. The camera uses a tailor-made 6P lens module, which adjusts the path of light to minimize image distortion and maximize clarity. . . .
POSITIVE REVIEW
It has Dual Selfie mode with Wide viewing angle which is best in its class.
Speaker has quite clean and powerful sound.
NEGATIVE REVIEWS
It feels heavy in hands as well as in pocket with 185 grams weight.
No support for FM radio and NFC
DIALOGUE
Buyer(N): Hi, Can you recommend me some good phone to buy?
Seller(N): Hi, Sure I would like to recommend you oppo-f3-plus.
Buyer(RP): Tell me more about the Dual Selfie with Wide viewing angle.
Seller(RP+D): It has Dual Selfie Camera. One for Selfie, One for Groupfie The OPPO F3 Plus sports dual front cameras for best selfies - A 16 MP camera and a 8 MP 120 wide angle camera.
Buyer(D): Ok. How is the performance of the camera?
Seller(D): The camera uses a tailor-made 6P lens module, which adjusts the path of light to minimize image distortion and maximize clarity.
Buyer(RN): As good as the product looks but it has No support for FM radio and NFC.
Seller(RP): No it does not but the Speaker has quite clean and powerful sound.
Buyer(F): Well how much does it cost?
Seller(F): Just 30,990
Buyer(N): Well, ok I will give it a try.
Seller(N): You will surely enjoy the product.
Buyer(N): Buy
Seller(N): Sell
Battery | 4000mAh |
RAM | 4GB |
Price | 30,990 |
Internal Memory | 64GB |
Screen | 6 inches |
FEATURE TABLE
Outline
Baseline Results
Baseline Results
Results on En-DSTC2
RNN + CROSS -GCN-SeA
Results on code-mixed data
Effect of using more GCN hops
PPMI vs Raw Frequencies
Dependency or co-occurrence structure really needed ?
Dependency edges
Random edges
Ablations
RNN
GCN
RNN-GCN
Encoder :
Attention :
Bahdanau
Sequential
GCN
RNN-GCN
Sequential
RNN
GCN
RNN-GCN
Ablations
Ablations
GCNs do not outperform RNNs independently:
performance of GCN-Bahdanau attention < RNN-Bahdanau attention
Our Sequential attention outperforms Bahdanau attention:
GCN-Bahdanau attention < GCN-Sequential attention
RNN-Bahdanau attention < RNN-Sequential attention (BLEU & ROUGE)
RNN+GCN-Bahdanau attention < RNN+GCN-Sequential attention
Combining GCNs with RNNs helps:
RNN-Sequential attention < RNN+GCN-Sequential attention
Best results are always obtained by the final model which combines RNN, GCN and Sequential attention
Conclusion
A single attention distribution overburdens the attention mechanism
Separated the history into Pre-KB, KB and Post-KB parts and attended sequentially over them
Showed that structure-aware representations are useful in goal-oriented dialogue
Used GCNs to infuse structural information of dependency graphs into the learned representations
When dependency parsers are not available, we used word co-occurrence frequencies and PPMI values to extract a contextual graph
Obtained state-of-the-art performance on the modified DSTC2 dataset and its code-mixed versions
Future Work
Extend the model to multidomain goal-oriented dialogue (restaurants, hotels, taxi)
Conditional code-mixed response generation
Better copy mechanism
Use the whole Knowledge-Graph instead of dialogue specific KB triples
Use semantic graphs along with dependency parse trees
Publications
Questions?
Data Evaluation
Language Understanding
Language Understanding
User utterance
Predicted
Intent
I need a cheap chinese restaurant in the north of town.
Classifier
Semantic Frame
request_rest(cuisine=chinese, price=cheap, area=north)
Intent Classification
Intent Classification
Language Understanding
User utterance
Predicted
Tags
I need a cheap chinese restaurant in the north of town.
Slot Filling
I | <null> |
need | <null> |
a | <null> |
cheap | <price> |
chinese | <cuisine> |
restaurant | <null> |
in | <null> |
the | <null> |
north | <area> |
of | <null> |
town | <null> |
Dialogue Management
User: Book a table at Prezzo for 5.
System: How many people?
User: For 3.
#People
Time
Language Generation
Language Generation
inform(rest=Prezzo, cuisine=italian)
System action
System response
Prezzo is a nice restaurant which serves italian.
Future Plans