Julián Bayardo
Notebook: bit.ly/pydata-cordoba-2018-ipy
Slides & Qs: bit.ly/pydata-cordoba-2018-slides
Product ID for PS4
Primary
Secondary
Purchase!
Bug Report
Where's the soda?
Trigger Item
f :: Item -> [Item]
f trigger = ...
f :: Item -> User -> [Item]
f trigger user = ...
f :: Item -> Navigation -> [Item]
f trigger history = ...
f :: Item -> User -> Navigation -> [Item]
f trigger user history = ...
You are here
Coverage@20: 84%
Item to Purchases
Item to Views
f :: Item -> [Item]
f trigger = ...
f :: Item -> User -> [Item]
f trigger user = ...
f :: Item -> Navigation -> [Item]
f trigger history = ...
f :: Item -> User -> Navigation -> [Item]
f trigger user history = ...
Still here
With some of
Trigger
...
Trigger
Recommendation
Fernet
Hello Kitty Mate Set
Trigger Item
...people have already dealt with this problem in NLP...
The dog is under the big table
Center
Sliding Window
Center | Context |
---|---|
Negative Log Likelihood
Over Corpus
Over Window
Cross Entropy
under
dog
table
under
dog
table
Expensive!
under
dog
ring
magic
Center
Context
Noise
1
0
Error on Positive Example
Error on Negative Examples
Over Corpus
Over Window
Cross Entropy Estimate
Discard a word from the dataset with probability:
from gensim.models import Word2Vec
sentences = [
["hello", "pydata"],
["how", "are", "you", "doing"],
["well", "this", "is", "embarrassing"]]
model = Word2Vec(sentences,
size=300, # Vector dimensions
window=5, # Sliding window size
sg=1, # Use the skip gram model, as exposed here
hs=0, # Use negative sampling, as exposed here
negative=5, # Number of negative samples
ns_exponent=0.75, # Unigram distribution's exponent
sample=1e-4) # Subsampling rate
AWS Budget
item_id: MLB938974539
cat_l7_id: MLB7264
title: carcaca
title: dodge_dart
domain_id: MLB-CLASSIC_CARS
item_id: MLB972105778
cat_l7_id: MLB3168
title: plymouth
title: sedan
domain_id: MLB-CLASSIC_CARS
item_id: MLB993188183
cat_l7_id: MLB7264
title: dodge
title: van_furgao
item_id: MLB938974539
cat_l7_id: MLB3168
title: dodge
Most Similar
Most
Similar (180-200)
Its the same product all the way down. And then there's turtles.
Most
Similar
Herschel - Bag
Herschel Wallets!
This is actually a pretty bad example, as it is an uncommon product
item_id: ...
prod_id: ...
cat_id: ...
...
Unavailable!
Vector
Vector
...
Avg
In our case...
Slides & Qs: bit.ly/pydata-cordoba-2018-slides
Notebook: bit.ly/pydata-cordoba-2018-ipy
Parameter | Value |
---|---|
Method | Skip-Gram |
Embedding Size | 128 |
Window Size | 40 |
Negative Examples | 20 |
Negative Sampling Exponent | -0.5 |
Subsampling Rate | 10e-4 |
Minimum Count | 10 |
Lots of metadata!
Sample infrequent items
Disclaimer: most of these were found through experimentation
The dog is under the table
cat
bird
beside
over
bed
sofa
The dog is under the big table
Center
Sliding Window
Center | Context |
---|---|
Regularization (Implicit)
There is one of M for every kind of metadata added