Car for commute, which to choose?
Car for commute, which to choose?
Active Learning
Select a point
Interested/Not interested
Select a point
Interested/Not interested
Too much samples required!
Key idea
Simple case: convex query.
2D space, 3 positive points.
Green area: all positive!
Simple case: convex query.
2D space, 2 positive points, 1 negative.
Red area: all negative!
Simple case: convex query.
Red: negatives, green: positives, white: uncertain.
Simple case: convex query.
Red: negatives, green: positives, white: uncertain.
Obvious
Obvious
Obvious
Obvious
Obvious
Active learning
Active learning
Active learning
1. Uncertain sample a point \(x\)
2. if \(x\) in {red, green}, obvious
3. if \(x\) in white, normal AL
What if a query is non-convex?
\(rowv^2 + colv^2 > 0.2^2 \text{ AND } 480 < rowc < 885 \)
Observation: Usually conjunctions
\(rowv^2 + colv^2 > 0.2^2 \text{ AND } 480 < rowc < 885 \)
Given a conjunction:
factorize
Given a point \(x\), consider all its subspaces [ , ]:
Case:
Non-convex => output 0,
Positive region => output 1,
Negative region => output -1,
Uncertain region => output 0,
Finally, select the minimal output.
\(+ \wedge ? \Leftrightarrow min(1,0) \Leftrightarrow ?\),
\(+ \wedge - \Leftrightarrow min(1,-1) \Leftrightarrow -\),
A positive region built from 4 positive points, {A, B, C, and D}
Larger coverage
\(D^+\): Points in an unlabeled test set is flagged as positive by the polytope
\(D^u\): Points in an unlabeled test set is uncertain to the polytope
Dataset: 190M tuples, 510 attributes.
Baseline: Active learning (AL), Active Search (AS)
\(\gamma\) controls the probability to sample outside the polytope. Small \(\gamma\) suffers from useless samples.
Dataset: 5622 vehicles with 27 attributes such as the model, year, length, height, engine and retail price.
Users: 11 CS graduates, 7 non-technical persons.
1. Given scenario: buying a car for everyday city commute, outdoor sports, an elderly in the family, or a small business.
2. Asked each user to find all the cars that meet the requirements
3. Let the user issue query and check the result then refine the query again until the user is satisfied. -- simulate no explore-by-example system scenario.
Generated query has the property:
1. 18 queries are in the range of [0.2%, 0.9%].
2. 117 predicates in total: 70 numerical and convex, 47 categorical and non-convex.
3. All the queries are conjunctive; the only disjunction is applied to the categorical attributes
DSM > AL > Human in user study