SIRLA
講者:楊子右
時間:2019/12/18
請求
傳送資料
browser
server
資料解析
取得資訊
GET | POST | |
---|---|---|
網址差異 | 網址會帶有 HTML Form 表單的參數與資料。 | 資料傳遞時,網址並不會改變。 |
資料傳遞量 | 由於是透過 URL 帶資料,所以有長度限制。 | 由於不透過 URL 帶參數,所以不受限於 URL 長度限制。 |
安全性 | 表單參數與填寫內容可在 URL 看到。 | 透過 HTTP Request 方式,故參數與填寫內容不會顯示於 URL。
|
pip install requests
import requests
response = requests.get('https://www.railway.gov.tw/tra-tip-web/tip/tip001/tip112/querybytime')
print(response.text)
with open('request.html', 'w', encoding='utf-8') as f:
f.write(response.text)
data1 = {'_csrf': '56538d1c-2a43-41cf-a65c-d0ed7cec7c8f',
'trainTypeList': 'ALL',
'transfer': 'ONE',
'startStation': '3360-彰化',
'endStation': '3470-斗六',
'rideDate': '2019/12/19',
'startOrEndTime': 'true',
'startTime': '00:00',
'endTime': '23:59'}
response = requests.post('https://www.railway.gov.tw/tra-tip-web/tip/tip001/tip112/querybytime', data = data1)
傳送資料
with open('requests.html', 'w', encoding='utf-8') as f:
f.write(response1.text)
import requests
res = requests.get('https://www.stockdog.com.tw/stockdog/index.php?m=overview&sid=1101')
print(res.text)
user_agent = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) ...'}
res = requests.get('https://www.stockdog.com.tw/stockdog/index.php?m=overview&sid=1101',headers = user_agent)
import requests
res = requests.get('http://isbn.ncl.edu.tw/NEW_ISBNNet/main_DisplayRecord_Popup.php?Pact=view&Pkey=1080117*0046')
print(res.text)
res.encoding = 'utf-8'
with open('encoding.html', 'w', encoding='utf-8') as f:
f.write(res.text)
pip install BeautifulSoup4
from bs4 import BeautifulSoup
# 原始 HTML 程式碼
html_doc = '<html> \
<body> \
<h1 id="title">Hello World</h1> \
<a href="#" class="link">This is link1</a> \
<a href="# link2" class="link">This is link2</a> \
</body> \
<html> '
# 以 Beautiful Soup 解析 HTML 程式碼
soup = BeautifulSoup(html_doc, 'html.parser')
<h1 id="title" > Hello World </h1>
標籤(Tag)
屬性(Attribute)
文字
print(soup.text)
print(soup.contents) #ALL
print(soup.select('html')) #TAG
print(soup.select('h1')) #TAG
print(soup.select('a')) #TAG
print(soup.select('#title')) #ID
print(soup.select('.link')) #CLASS
擷取圖書資訊
資訊組織
作者:張慧銖、陳淑燕、邱子恒、陳
出版社:華藝數位
出版日:2017/6/21
ISBN:9789864371310
適讀年齡:全齡適讀
import requests
from bs4 import BeautifulSoup
user_agent = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'}
res = requests.get('https://www.kingstone.com.tw/new/basic/2010230006305?zone=book&lid=search&actid=WISE',headers = user_agent)
soup = BeautifulSoup(res.text, 'html.parser')
print(soup)
a = soup.select('.pdname_basic')[0]
print(a.text)
b = soup.select('.basiccol')[0]
result = ''
for i in range(1,6):
c = b.select('.basicunit')[i]
result += c
print(result)
if i < 3:
d= c.select('.title_basic')[0]
result += d.text
e = c.select('a')[0]
result += e.text
else:
result += ' '.join(c.text.split())
if i < 5:
result += '\n'
import requests
from bs4 import BeautifulSoup
user_agent = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'}
res = requests.get('https://www.kingstone.com.tw/new/basic/2010230006305?zone=book&lid=search&actid=WISE',headers = user_agent)
soup = BeautifulSoup(res.text, 'html.parser')
a = soup.select('.pdname_basic')[0]
print(a.text)
b = soup.select('.basiccol')[0]
result = ''
for i in range(1,6):
c = b.select('.basicunit')[i]
if i < 3:
d= c.select('.title_basic')[0]
result += d.text
e = c.select('a')[0]
result += e.text
else:
result += ' '.join(c.text.split())
if i < 5:
result += '\n'
print(result)
完整參考程式碼
擷取電影資訊
小小夜曲
上映日期:2019-11-22
片 長:02時00分
發行公司:天馬行空
import requests
from bs4 import BeautifulSoup
res = requests.get('https://movies.yahoo.com.tw/movieinfo_main/%E5%B0%8F%E5%B0%8F%E5%A4%9C%E6%9B%B2-little-nights-little-love-10213')
soup = BeautifulSoup(res.text,'html.parser')
a = soup.select('.movie_intro_info_r')[0]
b = a.select('h1')[0]
print(b.text)
for i in range(0,3):
c = a.select('span')[i]
print(c.text)
完整參考程式碼