Enabling SQL-ML Explanation to Debug Training Data

Enabling SQL-ML Explanation to Debug

Training Data

Weiyuan Wu

youngw@sfu.ca

Committee:

Dr. Jiannan Wang - Senior Supervisor

Dr. Jian Pei - Supervisor

Dr. Oliver Schulte - Examiner

Dr. Steven Bergner - Chair

Thesis Defense

Aug. 22. 2019

SQL-ML Query

 SELECT COUNT(*)
  FROM INBOX 
 WHERE predict(INBOX.text) = 'spam' 
   AND INBOX.date = 'Aug. 22, 2019' SELECT COUNT(*)
  FROM INBOX 
 WHERE predict(INBOX.text) = 'spam' 
   AND INBOX.date = 'Aug. 22, 2019'

ID	Text	Label
1	CLICK AND GET FREE PICKLE AT http://spam.com/clickme...	Spam
2	Hi Rick, lets have a meeting tmr 8pm -Rick	Spam
3	Grandpa, the light in the garage is broken - Morty	Ham

Count(*)
1000

ID	Text	Label
1	CLICK AND GET FREE PICKLE AT http://spam.com/clickme...	Spam
2	Hi Rick, lets have a meeting tmr 8pm -Rick
3	Grandpa, the light in the garage is broken -Morty	Ham

ID	Text	Predicted
1	WANTED: Rick, for crime against interdimensional space
2	Hi Rick, lets reschedule the meeting to next Monday -Rick
3	http://test-spam.com...

ID	Text	Predicted
1	WANTED: Rick, for crime against interdimensional space	P1
2	Hi Rick, lets reschedule the meeting to next Monday -Rick	P2
3	http://test-spam.com...	P3

A	B	P
a	b	1
d	b	1
f	g	1

A	B	P
a	b	1
d	b	0
f	g	0

A	B	P
a	b	1

ID	Text	Label
1	CLICK AND GET FREE PICKLE AT http://spam.com/clickme...	Spam

3	Grandpa, the light in the garage is broken - Morty	Ham

Enabling SQL-ML Explanation to Debug Training Data Weiyuan Wu youngw@sfu.ca Committee: Dr. Jiannan Wang - Senior Supervisor Dr. Jian Pei - Supervisor Dr. Oliver Schulte - Examiner Dr. Steven Bergner - Chair Thesis Defense Aug. 22. 2019

Enabling SQL-ML Explanation to Debug

Training Data

SQL-ML Query

SQL-ML System

Why SQL-ML System

Training Data is Often Corrupted

The Need for SQL-ML Explanation

A Debugging Workflow

Existing Approches

SQL & ML Explanation

Simple Combination

Ambiguity

Agenda

The SQL-ML Explanation Problem

Challenges

Agenda

InfComp: Influence & Complaint

Differentiable Query Result

SQL Provenance

Connect ML and SQL

Training Set Debugging

Influence Function

InfComp Algorithm

Agenda

Experiment: Compared Methods

Experiment: Tasks

Corruptions

Experiment Results: MNIST

Experiment Results: DBLP & ENRON

Experiment Result: Time Overhead

Conclusions

Future Works

Thank you!

	SELECT COUNT(*)
	FROM INBOX
	WHERE predict(INBOX.text) = 'spam'
	AND INBOX.date = 'Aug. 22, 2019'