# SAT Analogies

## with word vectors

DC Hack and Tell

2017-07-18

Aaron Schumacher

@planarrowspace

http://planspace.org/20170705-word_vectors_and_sat_analogies/

# Pop Quiz!

PALTRY : SIGNIFICANCE ::

A. redundant : discussion

B. austere : landscape

C. opulent : wealth

D. oblique : familiarity

E. banal : originality

RUNNER : MARATHON ::

A. envoy : embassy

B. martyr : massacre

C. oarsman : regatta

D. referee : tournament

E. horse : stable

KING : QUEEN ::

A. lion : cat

B. goose : flock

C. ewe : sheep

D. cub : bear

E. man : woman

# ... word vectors?

## Word vectors!

• So much word vector hype
• Every word represented by D numbers
• "D-dimensional word vectors"
• Subtract two words to get a relationship vector
• Compare relationship vectors
• "this relationship is most like that other one"

cat

lion

goose

flock

ewe

sheep

cub

bear

# a bunch of questions?

Random guessing: 1/5=20%

Average college applicant: 57%

Human voting: 82%

https://aclweb.org/aclwiki/SAT_Analogy_Questions_(State_of_the_art)

• 2005 paper (not by AltaVista, just using it)
• for each relationship, do 128 AltaVista searches
• KING and QUEEN
• QUEEN but not KING
• etc.
• log(number of search results) are vector values
• worked! 47% accuracy

# word vectors though?

``````import gensim

word2vec_file, binary=True)

vec = word2vec['cat']``````
``````import numpy as np

word_to_vec = {}
with open(filename) as f:
for line in f:
first_space_index = line.index(' ')
word = line[:first_space_index]
values = line[first_space_index + 1:]
vector = np.fromstring(values, sep=' ', dtype=np.float16)
word_to_vec[word] = vector
return word_to_vec

vec = word2vec['cat']``````

# in 300 dimensions?

• Almost all of 3M words in their own quadrant!
• Euclidean distance?
• Cosine distance?

By ajschumacher

• 443