Sorting Hat

如何以 Word to Vector 實作一頂分類帽

Presented by T毛可以吃泡芙嗎

Outline

 

  • 目標

  • 傳統分類 vs 分類帽

  • 實作

  • DEMO

  • 結論

  • 未來發展

目標

實作哈利波特中的分類帽

電影中的分類帽

  1. 跟魔法師們聊天
  2. 告訴他所屬學院

大概就像這樣

分類(Classification)

  • 填問卷

現有分類方式

聊天式分類

  • 讓使用者說一段話(自我介紹等等)
  • 從這段話擷取有用訊息(關鍵字)
  • 從這些訊息判斷使用者所屬群組

優點

  •         快         <------>      0~9
  •       簡潔       <------>  一堆表單
  •       準確       <------>  制式表單
  •     互動性     <------>  語音小姐
  • 提供資訊廣  <------>      0~9

語音輸入

語音輸入

  • 實現用聊天的方式達成分類
  • webkitSpeechRecognition()

Word2Vec

Word2Vec

  • 非監督式學習
  • 以文本為基礎,將文字轉為向量
  • 方便以向量方式操作文字

Example

Berlin - Germany = Paris - France

實作方法

Word2Vec Training

  • 選用文本
    • 哈利波特全七級
    • wikia

判斷方法 - 單字與學院關聯性

找出與單字距離最短的學院

print(  numpy.linalg.norm( model['brave'] - model['Gryffindor'] )  )
print(  numpy.linalg.norm( model['brave'] - model['Hufflepuff'] )  )
print(  numpy.linalg.norm( model['brave'] - model['Ravenclaw'] )  )
print(  numpy.linalg.norm( model['brave'] - model['Slytherin'] )  )

跟預期結果不符!!

失敗原因

print(model.most_similar('Gryffindor', topn=10)
print(model.most_similar('brave', topn=10)

文本中學院與其特質間的描述不足

判斷方法 - 單字與學院關聯性

改以學院特質的單字為判斷依據

  • Gryffindor: brave, nervous, ...
  • Hufflepuff: dedicated, patient, ...
  • Ravenclaw: intelligent, wise, ...
  • Slytherin: cunning, ambitious, ...

計算單字與該學院特質的平均距離

並取平均距離最短的學院

以courageous為例

平均距離最短的是Gryffindor

判斷一段文本與各學院關聯性

基本概念就是總距離越短越相近

學院代表字分佈

不同顏色代表不同學院

關鍵字(keyword)

  • 先有人後有工後有智慧
  • 客觀 & 直覺
  • trial & error

實作流程

  • Algorithm 1: 將word和一學院的所有keyword算距離平均後加總,分到總距離短的學院
  • Algorithm 2: 將word和四學院各自的所有keyword算距離平均後,距離最短的學院加1分,分到分數最多的學院
  • Algorithm 3: 將word和四學院各自的所有keyword取距離最小值做距離加總,分到總距離短的學院

Algorithm 1

  • 某些廢字會強烈影響結果
  • keyword未必會在同一區
  • 人的性格是多面向的
  • 傲慢vs聰明

 

同一學院keyword距離加總

Algorithm 2

  • 假設同一學院的keyword會聚在一起
  • 改善Algorithm 1,降低廢字的影響
  • 但同時也降低關鍵字的影響
  • ->加分比重改為加標準差
  • 分數越大越好

分別代表四學院向量平均

Algorithm 3

  • 想像性格分散在一個球中
  • 有不同的面相ex:R 傲慢vs聰明
  • 取平均似乎奇怪
  • 那就取最接近的吧!
  • 不只侷限在形容詞

選取最短距離

最短

測試結果

Sample

準確率

Gryffindor Hufflepuff Ravenclaw Slytherin
A1 H H H S
A2 H H H S
A3 G H R S

Algorithm 3 WIN

Hermione was noted for being extremely intelligent and hard-working, coming out on top in most of her classes and continuously aiding Harry and Ron in their adventures. 
She was so studious that the school gave her a Time-Turner in her third year, a device that rewinds time so that she could take extra courses. 
Because of her efficiency, she often had time to do hobby work on the side, such as preparing a defence for Buckbeak to save him from execution, and creating S.P.E.W., an organisation promoting the freedom of house-elves. 
Such acts demonstrate Hermione's social conscience, tenacity, and compassion. 
Unlike most wizards who depended solely on their magical ability, Hermione readily relied on logic. 
Although this often helped her cleverly deduce information that many others missed, such as Remus Lupin's lycanthropy, Hermione's emphasis on logic also made her sceptical about accepting anything without proof, as opposed to Harry who would come to intuitive conclusions. 
For example, she completely dismissed the idea of the Deathly Hallows, refusing to believe in them without physical evidence. 
Hermione was quite responsible, perfectionistic, and well put-together, which led to her being made a prefect during her fifth year. 
Throughout her entire school career, Hermione was insistent on order and steadfastly devoted to the rules, at the expense of her popularity. 
Her sense of humour was limited; she frequently expressed disapproval over Fred and George's practical jokes and threatened to put them in detention for selling prank items in the common room. 
She often attempted to act as the voice of reason among her more impulsive friends, to varying levels of success. 
However, in spite of her strait-laced disposition, Hermione was not above using coercion and threats to get what she wanted, as is evidenced by her blackmailing Rita Skeeter into writing a good article about Harry. 
Hermione was not afraid to stand up to her friends when she thought it was in their best interests, or when she felt they were wrong; she risked angering Harry by getting his Firebolt confiscated because she feared it might be jinxed, and argued with both him and Ginny over his use of the shady Half Blood Prince's textbook. 
Hermione was very determined and focused, in that she "always [kept] her attention focused on the job that must be done." 
Her refusal to break under torture shows her strength of willpower. 
Hermione demonstrated her bravery many times when facing danger, though she initially showed a tendency toward mild panic in the sudden situations She was extremely loyal to her friends, risking her life frequently to help them and standing by Harry even when no one else did. 
She also gave them advice rather often, such as in trying to make Harry understand Cho Chang's behaviour on their date, and in helping Ginny deal with her crush on Harry; this once prompted Ron to advise her to write a book to translate all the "mad things" 
Hermione was quite blunt with her opinions, sometimes to the point of being tactless; for example, her attempt to comfort Lavender Brown about the death of her rabbit did not go over well, and her honesty when dealing with centaurs in 1996 nearly landed her and Harry in serious trouble. 
Despite this, Hermione was generally sensitive to others' emotions, and would lie when she had to, though she was not a skilled liar. 
Hermione also tended to be rather argumentative, a trait most evident in her interactions with Ron. 
Although she was generally not as short-tempered as her friends, she displayed a formidable one on several occasions, such as slapping Draco Malfoy in defence of Hagrid, sabotaging Cormac McLaggen's tryout as a Keeper for the Gryffindor Quidditch team after he insulted Ron and Ginny, conjuring a flock of canaries to attack Ron in their sixth year and physically attacking Ron when he briefly abandoned her and Harry during their hunt for Horcruxes. 

Hermione Granger

結論

What really matters

  • 精準
  • 快速
  • 明確

To be honest

  • 對w2v技術了解不夠深
  • training data不夠
  • 以單詞出現頻率為基礎,無法處理文意
  • 沒看過Harry Potter
  • 沒有魔法師朋友

Possible Improvement

  • 讀paper
  • 爬影評、書評
  • 想辦法弄到大量自我介紹,丟入tensorflow
  • 看Harry Potter
  • 等朋友成為魔法師(o)

未來展望

傳統客服

  • 若要...請按1
  • 若要...請按2
  • 若要....請按3
  • .......

References

  • gensim

https://radimrehurek.com/gensim/models/word2vec.html

  • Harry Potter Wiki

http://harrypotter.wikia.com/wiki/Main_Page

Thanks for listening~

Made with Slides.com