Deep Learning for Natural Language Processing

Mitesh M. Khapra

Assistant Professor, Department of Computer Science & Engineering

Indian Institute of Technology Madras

ASP Interview (01-Mar-2021)

Professional Background

B.E.

M.Tech

Researcher

PhD Thesis: Reusing Resources for Multilingual Computation

Machine Translation

Debater

Multimodal Embeddings

Knowledge Base QA

Multimodal Chatbots

Ph.D.

2002

2008

2012

Assistant Professor

2012 - 2016

2016 - 2021

ગુ

हि

ಕ

म

ਪੰ

த

తె

Ph.D.

M.S.

Current

Make India AI ready

**A* (conference)**

A (conference)

**A* (journal)**

16

21

3 Sep'16 Feb'21

13

24 *

+

h-index

NLP Research@RBC (Three Main Themes/Contributions)

Indic NLP

हि

Code-mixed Chatbots

Indic NLU Benchmark

NLP: Natural Language Processing

NLU: Natural Language Understanding

NLG: Natural Language Generation

Multilingual Embeddings

Tools for under represented languages

বা

ગુ

हि

ಕ

म

ਪੰ

த

తె

ଓ

മ

অ

வீடு

घर

Suman Banerjee, Mitesh M. Khapra. Graph Convolutional Network with Sequential Attention for Goal-oriented Dialogue Systems. Transactions of the Association for Computational Linguistics (TACL), 2019

Interpretable NLP

input

output

NLP: Natural Language Processing

NLU: Natural Language Understanding

NLG: Natural Language Generation

मैं

यहां

हूं

Interpretrable Attention Networks

Post-hoc Statistical Inference of Attention

Reject $H_0$

I

am

here

+

I

am

here

मैं

यहां

हूं

Wow!?

Preksha Nema, Mitesh M. Khapra, Anirban Laha, B. Ravindran, Diversity Driven Attention Model for Query Based Abstractive Summarisation, The 55th Annual Meeting of the Association of Computational Linguistics (ACL 2017), 2017

मैं

यहां

I

am

I

am

here

I

am

here

I

am

here

मैं

यहां

NLP Research@RBC (Three Main Themes/Contributions)

Evaluating NLG

NLP: Natural Language Processing

NLU: Natural Language Understanding

NLG: Natural Language Generation

Taxonomy of evaluation metrics

How are you?

I am solid

I am liquid

Robust evaluation metrics for dialogs

'02

#metrics

'14

'21

3

14

57 Ref: the boy went home

Predicted: the boy went to his house

?

Task-aware evaluation metrics

director of Titanic?

Who is the actor of Leonardo?

Ananya B. Sai, M Akash Kumar, Siddhartha Arora, Mitesh M. Khapra. Improving Dialog Evaluation with a Multi-reference Adversarial Dataset and Large Scale Pretraining. Transactions of the Association for Computational Linguistics (TACL), 2020.

NLP Research@RBC (Three Main Themes/Contributions)

Teaching@IITM

Fundamentals of Deep Learning

Topics in Deep Learning

Introduction to Programming

Introduction to Machine Learning

void fun()
{
	int i = 0;
	i++;
	fun();
}


void main () 
{
	fun();
}

Linear Algebra & Random Processes

rank=m < n

rank=m < n

\begin{bmatrix} ~~&~&~~&~~&\\ ~~&~&~~&~~&\\ ~~&~&~~&~~&\\ \end{bmatrix}

\begin{bmatrix} ~~&~&~~&~~&\\ ~~&~&~~&~~&\\ ~~&~&~~&~~&\\ \end{bmatrix}

\underbrace{~~~~~~~~~~~~~~~~~~}

\underbrace{~~~~~~~~~~~~~~~~~~}

\underbrace{~~~~~~~}

\underbrace{~~~~~~~}

Pivots

Free

Object Oriented Algorithms Implementation and Analysis

Course TCF

Instructor TCF

0.94

0.91

0.92

0.95

0.97

0.86

0.89

0.91

0.96

0.81

0.75

Projects@IITM

Knowledge Graph Driven Multimodal Conversation Systems

In which years was the per capita income in Delhi greater than that in Chennai?

Impact: Largest publicly available dataset for reasoning over scientific plots, state of the art model for extracting visual objects in scientific plots

USD 50K

USD 10K

AI for All

AAAI 2021, WACV 2020

USD 35K

RL for NLG

USD 15K

Indic QA

हि

Awards

Young Faculty Recognition Award (2019)

Prof. B. Yegnanarayana Award for Excellence in Research and Teaching (2020)

Google Faculty Research Award (2018)

This award was instituted by Google to recognize and support world-class faculty pursuing cutting-edge research in areas of mutual interest.

This is an award instituted by an alumnus Dr.P.Balasubramanian (1971/ BT/ AE,1973/ MT/ IM) for recognizing young faculty who have done well in research and have been good teachers in the courses.

The purpose of the award is to recognize regular CSE faculty members of IIT Madras who excel in research and teaching.

Technical Talk

Preksha Nema, Mitesh M. Khapra, Anirban Laha, B. Ravindran, Diversity Driven Attention Model for Query Based Abstractive Summarisation, The 55th Annual Meeting of the Association of Computational Linguistics (ACL 2017), 2017

Task: Query Based Extractive Summarisation

Roger Federer wins a record eighth men’s singles title at Wimbledon on Sunday. He defeated Marin Cilic in straight sets with 6-3, 6-1, 6-4. Cilic appeared to struggle with a foot injury but the Swiss was in imperious form on Centre Court, winning the final in one hour and 41 minutes. It is Federer’s 19th grand slam title and his second of 2017 following victory at the Australian Open in January.

Document

Query

Federer won record eighth Wimbledon title beating Marin Cilic in straight sets.

What happened in the finals at Wimbledon?

Summary

Existing Models

What

happened

...

Wimbledon

+

...

+

Roger

Federer

...

January

+

<Start>

Federer

Query Encoder

Document Encoder

Attention Network

Federer

won

straight

won

... ...

sets

Decoder

...

The problem

What

happened

...

Wimbledon

+

...

+

Roger

Federer

...

January

+

<Start>

Federer

Query Encoder

Document Encoder

Attention Network

Federer

won

straight

Federer

won in straight

sets won sets

Decoder

...

Repeating phrases in the output

Our hypothesis

Federer

Query Encoder

Document Encoder

Attention Network

Federer

Decoder

...

t=1

t=2

$c_1$

$c_2$

May be the context vectors at the two time steps ( $c_t, c_{t+1}$ ) are very similar

Our solution(s)

D1: Orthogonalize the context vectors

$c_{t-1}$

$c_t$

$c'_t$

Issue: What about the past history?

D2: Maintain History

$c_t$

$d_{t-1}$

$d'_t$

Issue: What if we want phrases to repeat?

He kept talking and talking and talking!

$c_1$

$c_2$

$c_3$

$c_{t-1}$

$d_{t-1}$

$d_t$

$d_{t}$

Federer won in straight sets in sets

Our solution(s)

D3: Soft orthogonalisation

$d_{t-1}$

$d_t$

$d'_t$

Introduce a parameter $\gamma$ which controls what fraction of the component along $c_{t-1}$ should be subtracted from $c_t$

He kept talking and talking and talking!

Results: (Quantitative)

What percentage of the reference summary is includes in the predicted summary

Models	Rouge-1	Rouge-2	Rouge

\overbrace{~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~}

\overbrace{~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~}

Encode-Attend-Decode	13.73	2.06	12.84
SOTA [Chen et.al. 2016]	33.06	13.25	32.17

D1 (orthogonalize)	33.85	13.65	32.99

D2 (history)	38.12	16.76	37.31

D3 (soft)	41.26	18.75	40.43

Baselines

\begin{cases} \\ \end{cases}

\begin{cases} \\ \end{cases}

\begin{cases} \\ \\ \end{cases}

\begin{cases} \\ \\ \end{cases}

Proposed Solutions

Results: (Qualitative)

Fuel cell critics point out that hydrogen is flammable, but so is gasoline. Unlike gasoline, which can pool up and burn for a long time, hydrogen dissipates rapidly. Gas tanks tend to be easily punctured, thin-walled containers, while the latest hydrogen tanks are made from Kevlar. Also, gaseous hydrogen isn’t the only method of storage under consideration – BMW is looking at liquid storage while other researchers are looking at chemical compound storage, such as boron pellets.

Query

Are hydrogen fuel cell vehicles safe?

Document

Reference

Hydrogen in cars is less dangerous than gasoline

Baseline

Hydrogen is hydrogen hydrogen hydrogen fuel energy

D1

D3

Hydrogen in cars is reduce risk than fuel

Hydrogen in cars is less dangerous than gasoline

Results: (Qualitative)

The basis of all animal rights should be the Golden Rule: we should treat them as we would wish them to treat us, were any other species in our dominant position.

Query

Do animals have rights that makes eating them inappropriate?

Document

Reference

Animals should be treated as we would want to be treated

Baseline

Animals should be treated as we would protect to be treated

D1

D2

Animals should be treated as we most individual to be treated

Animals should be treated as those want to be treated

Summary

Encode

Attend

Decode

...

Refine

Multiple follow-up works

Genesis of a Ph.D. thesis

Future Research Plan

Indic NLP

Interpretable NLP

input

output

Evaluating NLG

Full NLP stack for Indian languages

Input tools

Basic Building Blocks

Generation

Reasoning

Interpreting Multilingual Mutimodal Models

வீடு

घर

Adversarial Evaluation of Evaluation models

How are you?

I am liquid

Thank You!

Research@IITM (Three Main Themes/Contributions)

Indic NLP

Interpretable NLP

Evaluating NLG

input

output

हि

Code-mixed Chatbots

Indic NLU Benchmark

NLP: Natural Language Processing

NLU: Natural Language Understanding

NLG: Natural Language Generation

Multilingual Embeddings

Tools for less represented languages

मैं

यहां

हूं

Interpretrable Attention Networks

Post-hoc Inference of Attention

Reject $H_0$

বা

ગુ

हि

ಕ

म

ਪੰ

த

తె

ଓ

മ

অ

வீடு

घर

I

am

here

+

मैं

यहां

I

am

I

am

here

I

am

here

I

am

here

मैं

यहां

I

am

here

मैं

यहां

हूं

Wow!?

Taxonomy of evaluation metrics

How are you?

I am solid

I am liquid

Robust evaluation metrics

'02

#metrics

'14

'21

3

14

57

Why this Work?

Preksha Nema, Mitesh M. Khapra, Anirban Laha, B. Ravindran, Diversity Driven Attention Model for Query Based Abstractive Summarisation, The 55th Annual Meeting of the Association of Computational Linguistics (ACL 2017), 2017

Emotional: First paper at IITM with my first PhD student

IJCAI 2018, NAACL 2018, ACL 2020

Impactful:

Teaching@IITM

Fundamentals of Deep Learning

Topics in Deep Learning

Introduction to Programming

Introduction to Machine Learning

void fun()
{
	int i = 0;
	i++;
	fun();
}


void main () 
{
	fun();
}

Linear Algebra & Random Processes

rank=m < n

rank=m < n

\begin{bmatrix} ~~&~&~~&~~&\\ ~~&~&~~&~~&\\ ~~&~&~~&~~&\\ \end{bmatrix}

\begin{bmatrix} ~~&~&~~&~~&\\ ~~&~&~~&~~&\\ ~~&~&~~&~~&\\ \end{bmatrix}

\underbrace{~~~~~~~~~~~~~~~~~~}

\underbrace{~~~~~~~~~~~~~~~~~~}

\underbrace{~~~~~~~}

\underbrace{~~~~~~~}

Pivots

Free

Object Oriented Algorithms Implementation and Analysis

Young Faculty Recognition Award

Prof. B. Yegnanarayana Award for Excellence in Research and Teaching

Core

Co-taught

en-fr

en-de

4M

# sentences

existing sources

MR || sources

non-MR || sources

monolingual sources

en-hi

en-bn

10M

en-ta

MR || sources

PIB $_{Anuvaad}$ - PIB $_{IIITH}$

Wikipedia

non-MR || sources

Constitution

TN Assembly

AP Assembly

Monolingual sources

IndicCorp

4 new crawls

ASP Interview - IIT Madras

By Mitesh Khapra

ASP Interview - IIT Madras

4 years ago
664

Mitesh Khapra

Deep Learning for Natural Language Processing

Mitesh M. Khapra

Assistant Professor, Department of Computer Science & Engineering

Indian Institute of Technology Madras

ASP Interview (01-Mar-2021)

Professional Background

B.E.

M.Tech

Researcher

PhD Thesis: Reusing Resources for Multilingual Computation

Machine Translation

Debater

Multimodal Embeddings

Knowledge Base QA

Multimodal Chatbots

Ph.D.

2002

2008

2012

Assistant Professor

2012 - 2016

2016 - 2021

Ph.D.

M.S.

Current

Make India AI ready

A* (conference)

A (conference)

A* (journal)

16

21

3

Sep'16 Feb'21

13

24

*

*

*

*

*

*

+

+

+

+

h-index

NLP Research@RBC (Three Main Themes/Contributions)

Indic NLP

Code-mixed Chatbots

Indic NLU Benchmark

NLP: Natural Language Processing

NLU: Natural Language Understanding

NLG: Natural Language Generation

Multilingual Embeddings

Tools for under represented languages

வீடு

घर

Suman Banerjee, Mitesh M. Khapra. Graph Convolutional Network with Sequential Attention for Goal-oriented Dialogue Systems. Transactions of the Association for Computational Linguistics (TACL), 2019

Interpretable NLP

input

output

NLP: Natural Language Processing

NLU: Natural Language Understanding

NLG: Natural Language Generation

मैं

यहां

हूं

Interpretrable Attention Networks

Post-hoc Statistical Inference of Attention

Reject H0H_0H0​

I

am

here

+

I

am

here

मैं

यहां

हूं

**A* (conference)**

**A* (journal)**

Reject $H_0$