Natural Language Processing入門

講者:王譽錚

時間:2019/10/19

大綱

  • 自然語言
  • NLTK實際運用
  • 參考資源

自然語言

定義

  • 人日常交流用的語言,例如:英文,中文......
  • 為長時間演變而來的,很難有明確的規範

NLTK 3

設計目標

  • 簡單
  • 一致性
  • 可擴展性
  • 模組化

NLTK實際運用

安裝NLTK

pip install nltk
C:\Users\ASUS>python
Python 3.7.4 (tags/v3.7.4:e09359112e, Jul  8 2019, 19:29:22) [MSC v.1916 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import nltk
>>> nltk.download()

nltk.book

>>> from nltk.book import *
*** Introductory Examples for the NLTK Book ***
Loading text1, ..., text9 and sent1, ..., sent9
Type the name of the text or sentence to view it.
Type: 'texts()' or 'sents()' to list the materials.
text1: Moby Dick by Herman Melville 1851
text2: Sense and Sensibility by Jane Austen 1811
text3: The Book of Genesis
text4: Inaugural Address Corpus
text5: Chat Corpus
text6: Monty Python and the Holy Grail
text7: Wall Street Journal
text8: Personals Corpus
text9: The Man Who Was Thursday by G . K . Chesterton 1908

>>> text1 #文本編號
<Text: Moby Dick by Herman Melville 1851>

搜尋單詞

>>> text1.concordance("monstrous") # 文本編號.concordance("要搜尋的單詞")

Displaying 11 of 11 matches:
ong the former , one was of a most monstrous size . ... This came towards us ,
ON OF THE PSALMS .  "Touching that monstrous bulk of the whale or ork we have r
ll over with a heathenish array of monstrous clubs and spears . Some were thick
d as you gazed , and wondered what monstrous cannibal and savage could ever hav
that has survived the flood ; most monstrous and most mountainous ! That Himmal
they might scout at Moby Dick as a monstrous fable , or still worse and more de
th of Radney .'" CHAPTER 55 Of the Monstrous Pictures of Whales . I shall ere l
ing Scenes . In connexion with the monstrous pictures of whales , I am strongly
ere to enter upon those still more monstrous stories of them which are to be fo
ght have been rummaged out of this monstrous cabinet there is no telling . But
of Whale - Bones ; for Whales of a monstrous size are oftentimes cast up dead u

搜尋構造相似的單詞

>>> text1.similar("monstrous") #文本編號.similar("單詞")

true contemptible christian abundant few part mean careful puzzled
mystifying passing curious loving wise doleful gamesome singular
delightfully perilous fearless

搜尋兩個字詞共用的前後一個單詞

>>> text2.common_contexts(["monstrous", "very"]) #文本編號.common_contexts("單詞1", "單詞二")

a_pretty am_glad a_lucky is_pretty be_glad

打亂並生成新文本

>>> text3.generate() #文本編號.generate()

Building ngram index...
laid by her , and said unto Cain , Where art thou , and said , Go to ,
I will not do it for ten ' s sons ; we dreamed each man according to
their generatio the firstborn said unto Laban , Because I said , Nay ,
but Sarah shall her name be . , duke Elah , duke Shobal , and Akan .
and looked upon my affliction . Bashemath Ishmael ' s blood , but Isra
for as a prince hast thou found of all the cattle in the valley , and
the wo The "laid by her , and said unto Cain , Where art thou , and 
said , Go to ,\nI will not do it for ten ' s sons ; we dreamed each 
man according to\ntheir generatio the firstborn said unto Laban , 
Because I said , Nay ,\nbut Sarah shall her name be . , duke Elah , 
duke Shobal , and Akan .\nand looked upon my affliction . Bashemath 
Ishmael ' s blood , but Isra\nfor as a prince hast thou found of all 
the cattle in the valley , and\nthe wo The"

色散圖(dispersion plot)

pip install Numpy
pip install Matplotlib
# 文本編號.dispersion_plot(["單詞1", "單詞2", "單詞3", ...])
>>> text4.dispersion_plot(["citizens", "democracy", "freedom", "duties", "America"])

色散圖

參考資源

10/19 retrieved from Natural Language Processing with Python:http://www.nltk.org/book/
10/19 retrieved from 維基百科 - Matplotlib:https://zh.wikipedia.org/wiki/Matplotlib

10/19 retrieved from 維基百科 - NumPyhttps://zh.wikipedia.org/wiki/NumPy

THE END

NLP入門

By arashi

NLP入門

  • 103