Visual Question Answering (VQA)

Laksh Arora

@techedlaksh

Anthill Inside 2017

About Me

  • Recent Graduate from IPU.
  • Pythonista at heart and has interests in applications of Machine Learning and Computer Vision. 
  • Organiser of PyDataDelhi meetups. 
  • Previous talks @ PyDataDelhi community, CSI and other various small meetups.
  • Independent study on Computer Vision, Deep Learning and RL
  • Introduction to VQA
  • Motivation
  • Dataset
  • Methodology
  • Results
  • Code
  • Future Work

Agenda

What is VQA ?

Visual

Question

Answering

Predict the answer of a given question related to an image

  • > 0.25 million images
  • > 0.75 million questions
  • ~ 10 million answers

Dataset

COCO Dataset

VQA: Common Approach

Is there a bridge ?

Question

Visual Representation

Textual Representation

Merge

Predict Answer

No

Answer

Word/Sentence Embedding LSTM

CNN

Results

Code on Github

VQA models will answer any question

Visual Dialog

A man and woman are holding umbrellas.

What color is his umbrella?

Black

What about hers?

Multi-Colored

How many people are there in the image?

3

Thanks !

@techedlaksh
contactme@laksharora.com

 

Any Questions?