ThisPlusThat.me
Christopher Moody
Standard natural language
isn't great for searching
But the state of the art
can do amazing things!
King - Man + Woman = Queen
With these new techniques,
you can search for a movie that's more
thoughtful
Or find a movie that is less
depressing
- Wikipedia
- IMDB
- NYTimes
- Usenet
Map-reduce to preprocess & transform the text into n-grams
![](https://s3.amazonaws.com/media-p.slid.es/uploads/christophermoody/images/92911/elephant_rgb_sq.png)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/christophermoody/images/94620/download.jpeg)
word2vec trains a neural network, assigning a vector to every word
![](https://s3.amazonaws.com/media-p.slid.es/uploads/christophermoody/images/92913/Screen_Shot_2013-09-17_at_8.20.33_AM.png)
NumPy, Cython,
Numba & Numexpr
for high-performance vector math
![](https://s3.amazonaws.com/media-p.slid.es/uploads/christophermoody/images/92920/images.jpeg)
![](https://s3.amazonaws.com/media-p.slid.es/uploads/christophermoody/images/92921/Screen_Shot_2013-09-17_at_8.24.26_AM.png)
Christopher Moody
ThisPlusThat.me
![](https://s3.amazonaws.com/media-p.slid.es/uploads/christophermoody/images/94609/Screen_Shot_2013-09-18_at_7.16.51_PM.png)
word2vec uses a neural network to assign a vector to every word
Rows are a single word
Columns are a 'similarity dimension'
Nearby words have similar vectors.
![](https://s3.amazonaws.com/media-p.slid.es/uploads/christophermoody/images/94615/Screen_Shot_2013-09-18_at_7.21.43_PM.png)
The neural network estimates the probability that every word in the vocabulary appears around the training word.
Check the sentence to see if the word appears nearby, then backpropagate.