Python Asyncio
講者:土豆
時間:2021/05/30
大綱
- What is Asynchronous?
- asyncio 簡介
- asyncio 應用 - 非同步爬蟲
What is Asynchronous?
大家應該都有在餐廳看過這個東西
Synchronous 餐廳
假設:點餐1mins、製作2mins,且只有一個櫃檯
A點餐
A製作
總時間:9mins
B點餐
B製作
C點餐
C製作
1
2
3
4
5
6
7
8
9
Asynchronous 餐廳
假設:點餐1mins、製作2mins,且只有一個櫃檯
A點餐
A製作
總時間:5mins
B點餐
B製作
C點餐
C製作
1
2
3
4
5
6
7
8
9
asyncio 簡介
Example in synchronous way
import time
def fetch_data(data_name):
print('start fetching data:', data_name)
time.sleep(3)
print('stop fetching data:', data_name)
start_time = time.time()
for data_name in ['A', 'B', 'C']:
fetch_data(data_name)
print('花費時間:', time.time() - start_time, '秒')
Example in asynchronous way
import asyncio
import time
async def fetch_data(data_name):
print('start fetching data:', data_name)
await asyncio.sleep(3)
print('stop fetching data:', data_name)
async def main():
tasks = []
for data_name in ['A', 'B', 'C']:
tasks.append(asyncio.create_task(fetch_data(data_name)))
await asyncio.gather(*tasks)
start_time = time.time()
asyncio.run(main())
print('花費時間:', time.time() - start_time, '秒')
Terminology
Coroutine
一個function,你可以控制它暫停或是繼續,也可以在需要的時候讓它釋放資源給別的coroutine。
Event loop
負責執行並監控所有coroutine。
asyncio 應用 - 非同步爬蟲
安裝套件
pip install requests beautifulsoup4 # for scrawler
pip install aiohttp # for asynchronous scrawler
同步爬蟲
import time
import requests
from bs4 import BeautifulSoup
urls = [
"https://zh.wikipedia.org/wiki/%E7%8F%8D%E7%8F%A0%E5%A5%B6%E8%8C%B6",
"https://www.cosmopolitan.com/tw/lifestyle/food-and-drink/g34501236/milktea-20201104/",
"https://www.cna.com.tw/news/firstnews/202104170147.aspx",
"https://www.elle.com/tw/beauty/health/g33947289/drinking-bubble-milk-tea/",
"https://www.womenshealthmag.com/tw/food-nutrition/restaurant/g35087801/pearl-milk-tea-drinks-top10/",
"https://www.oktea.com.tw/product.php?pid_for_show=3340",
"https://www.chingshin.tw/product/pearl-milk-tea",
"https://www.books.com.tw/products/N001116299"
]
def fetch_data(url):
print('start fetching data:', url)
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
result = ''.join(soup.stripped_strings)
print('stop fetching data:', url)
return result
def main():
results = []
for url in urls:
results.append(fetch_data(url))
for i, result in enumerate(results):
with open('results/result' + str(i) + '.txt', 'w', encoding='utf8') as f:
f.write(result)
start_time = time.time()
main()
print('花費時間:', time.time() - start_time, '秒')
非同步爬蟲
import asyncio
import time
from bs4 import BeautifulSoup
from aiohttp import ClientSession
urls = [
"https://zh.wikipedia.org/wiki/%E7%8F%8D%E7%8F%A0%E5%A5%B6%E8%8C%B6",
"https://www.cosmopolitan.com/tw/lifestyle/food-and-drink/g34501236/milktea-20201104/",
"https://www.cna.com.tw/news/firstnews/202104170147.aspx",
"https://www.elle.com/tw/beauty/health/g33947289/drinking-bubble-milk-tea/",
"https://www.womenshealthmag.com/tw/food-nutrition/restaurant/g35087801/pearl-milk-tea-drinks-top10/",
"https://www.oktea.com.tw/product.php?pid_for_show=3340",
"https://www.chingshin.tw/product/pearl-milk-tea",
"https://www.books.com.tw/products/N001116299"
]
async def fetch_data(url, session):
print('start fetching data:', url)
r = await session.request(method='GET', url=url)
html = await r.text()
soup = BeautifulSoup(html, 'html.parser')
result = ''.join(soup.stripped_strings)
print('stop fetching data:', url)
return result
async def main():
tasks = []
async with ClientSession() as session:
for url in urls:
tasks.append(asyncio.create_task(fetch_data(url, session)))
results = await asyncio.gather(*tasks)
for i, result in enumerate(results):
with open('results/result' + str(i) + '.txt', 'w', encoding='utf8') as f:
f.write(result)
start_time = time.time()
asyncio.run(main())
print('花費時間:', time.time() - start_time, '秒')
References
- Async IO in Python: A Complete Walkthrough. 2021/5/30 from: https://realpython.com/async-io-python/
- Miguel Grinberg Asynchronous Python for the Complete Beginner PyCon 2017. 2021/5/30 from: https://www.youtube.com/watch?v=iG6fr81xHKA
- Python Asynchronous Programming - AsyncIO & Async/Await. 2021/5/30 from: https://www.youtube.com/watch?v=t5Bo1Je9EmE
Python Async
By Sam Yang
Python Async
- 501