Christopher Moody

Standard natural language 
 isn't great for searching

But the state of the art 

can do amazing things!

King - Man + Woman = Queen

With these new techniques,

you can search for a movie that's more


Or find a movie that is less


  • Wikipedia
  • IMDB
  • NYTimes
  • Usenet

Map-reduce to preprocess & transform the text into n-grams

word2vec trains a neural network, assigning a vector to every word

NumPy, Cython Numba & Numexpr

for high-performance vector math


Christopher Moody

word2vec uses a neural network to assign a vector to every word

Rows are a single word
Columns are a 'similarity dimension'
Nearby words have similar vectors.

The neural network estimates the probability that every word in the vocabulary appears around the training word.

Check the sentence to see if the word appears nearby, then backpropagate.

By Christopher Moody

  • 10,975