DC Hack and Tell
2017-07-18
Aaron Schumacher
@planarrowspace
http://planspace.org/20170705-word_vectors_and_sat_analogies/
PALTRY : SIGNIFICANCE ::
A. redundant : discussion
B. austere : landscape
C. opulent : wealth
D. oblique : familiarity
E. banal : originality
RUNNER : MARATHON ::
A. envoy : embassy
B. martyr : massacre
C. oarsman : regatta
D. referee : tournament
E. horse : stable
KING : QUEEN ::
A. lion : cat
B. goose : flock
C. ewe : sheep
D. cub : bear
E. man : woman
cat
lion
goose
flock
ewe
sheep
cub
bear
Random guessing: 1/5=20%
Average college applicant: 57%
Human voting: 82%
https://aclweb.org/aclwiki/SAT_Analogy_Questions_(State_of_the_art)
import gensim
word2vec_file = 'data/GoogleNews-vectors-negative300.bin'
word_to_vec = gensim.models.KeyedVectors.load_word2vec_format(
word2vec_file, binary=True)
vec = word2vec['cat']
import numpy as np
def read(filename):
word_to_vec = {}
with open(filename) as f:
for line in f:
first_space_index = line.index(' ')
word = line[:first_space_index]
values = line[first_space_index + 1:]
vector = np.fromstring(values, sep=' ', dtype=np.float16)
word_to_vec[word] = vector
return word_to_vec
word2vec = read('data/glove.twitter.27B.25d.txt')
vec = word2vec['cat']